I recently saw a neat trick on Twitter from Stephen Hay, who says that:
Few things clean up CMS-input HTML better than running it through Pandoc to convert to Markdown and then back to HTML again. 1 sec, big win.
Suppose we have a
tea-dance.html file that contains crufty markup, because it was generated by a WYSIWYG editor. We could clean it up by running this at the command line:
cat tea-dance.html | pandoc --from=html --to=markdown | pandoc --from=markdown --to=html
This emits a cleaned up version of
tea-dance.html on standard out.
We’re using pandoc as a filter, that is: a program “that accepts text at standard input, changes it in some way, and sends it to standard output”.
Running text from a Vim buffer through an external filter
Suppose that we open the
tea-dance.html file in Vim. We can use the bang Ex command to filter the contents of the current buffer through our pandoc pipeline:
:%!pandoc --from=html --to=markdown | pandoc --from=markdown --to=html
Vim will take the output from that pipeline and use it to overwrite the original text from the buffer.
In a followup tweet, Stephen suggests mapping this Ex command to a key so we can run it more easily. For example, you could add a mapping for normal mode and another for visual mode:
nnoremap <leader>gq :%!pandoc -f html -t markdown | pandoc -f markdown -t html<CR> vnoremap <leader>gq :!pandoc -f html -t markdown | pandoc -f markdown -t html<CR>
That’ll work, but I want to suggest a way of doing it without leader mappings.
Set up formatprg to filter selection through pandoc
In episode 18 of Vimcasts, I demonstrated how the external
par command could be used for the task of formatting plain text files with hard-wrapping.
As long as we’re using Vim version 8.0.0179 (or newer), we can use a similar technique here.
if has("autocmd") let pandoc_pipeline = "pandoc --from=html --to=markdown" let pandoc_pipeline .= " | pandoc --from=markdown --to=html" autocmd FileType html let &l:formatprg=pandoc_pipeline endif
That means we can filter the current line through pandoc by pressing
Or we can filter the entire buffer by pressing
Or we can switch to visual mode, and
gq will filter only the selected lines.
Update: When I originally published this episode, I assumed that the
formatprg option could be set for each buffer independently.
I was wrong then, but this is now possible since this patch by Sung Pae was accepted into Vim core.