I recently saw a neat trick on Twitter from Stephen Hay, who says that:
Few things clean up CMS-input HTML better than running it through Pandoc to convert to Markdown and then back to HTML again. 1 sec, big win.
pandoc is a swiss-army knife for converting between all sorts of markup formats. You can find installation instructions on the pandoc site.
Suppose we have a tea-dance.html
file that contains crufty markup, because it was generated by a WYSIWYG editor. We could clean it up by running this at the command line:
cat tea-dance.html | pandoc --from=html --to=markdown | pandoc --from=markdown --to=html
This emits a cleaned up version of tea-dance.html
on standard out.
We’re using pandoc as a filter, that is: a program “that accepts text at standard input, changes it in some way, and sends it to standard output”.
Running text from a Vim buffer through an external filter
Suppose that we open the tea-dance.html
file in Vim. We can use the bang Ex command to filter the contents of the current buffer through our pandoc pipeline:
:%!pandoc --from=html --to=markdown | pandoc --from=markdown --to=html
Vim will take the output from that pipeline and use it to overwrite the original text from the buffer.
In a followup tweet, Stephen suggests mapping this Ex command to a key so we can run it more easily. For example, you could add a mapping for normal mode and another for visual mode:
nnoremap <leader>gq :%!pandoc -f html -t markdown | pandoc -f markdown -t html<CR> vnoremap <leader>gq :!pandoc -f html -t markdown | pandoc -f markdown -t html<CR>
That’ll work, but I want to suggest a way of doing it without leader mappings.
Set up formatprg to filter selection through pandoc
In episode 18 of Vimcasts, I demonstrated how the external par
command could be used for the task of formatting plain text files with hard-wrapping.
As long as we’re using Vim version 8.0.0179 (or newer), we can use a similar technique here.
The gq
operation runs the selected text through the filter specified by formatprg
.
This autocommand sets formatprg
for HTML files to use our pandoc pipeline:
if has("autocmd") let pandoc_pipeline = "pandoc --from=html --to=markdown" let pandoc_pipeline .= " | pandoc --from=markdown --to=html" autocmd FileType html let &l:formatprg=pandoc_pipeline endif
That means we can filter the current line through pandoc by pressing gqq
.
Or we can filter the entire buffer by pressing gg
then gqG
.
Or we can switch to visual mode, and gq
will filter only the selected lines.
Update: When I originally published this episode, I assumed that the formatprg
option could be set for each buffer independently.
I was wrong then, but this is now possible since this patch by Sung Pae was accepted into Vim core.
Further reading
- Stephen Hay’s tweets: one and two
- pandoc
- Installing pandoc
:h filter
:h :range!
:h formatprg
:h gq