I recently saw a neat trick on Twitter from Stephen Hay, who says that:
Few things clean up CMS-input HTML better than running it through Pandoc to convert to Markdown and then back to HTML again. 1 sec, big win.
Suppose we have a
tea-dance.html file that contains crufty markup, because it was generated by a WYSIWYG editor. We could clean it up by running this at the command line:
cat tea-dance.html | pandoc --from=html --to=markdown | pandoc --from=markdown --to=html
This emits a cleaned up version of
tea-dance.html on standard out.
We’re using pandoc as a filter, that is: a program “that accepts text at standard input, changes it in some way, and sends it to standard output”.
Running text from a Vim buffer through an external filter
Suppose that we open the
tea-dance.html file in Vim. We can use the bang Ex command to filter the contents of the current buffer through our pandoc pipeline:
:%!pandoc --from=html --to=markdown | pandoc --from=markdown --to=html
Vim will take the output from that pipeline and use it to overwrite the original text from the buffer.
In a followup tweet, Stephen suggests mapping this Ex command to a key so we can run it more easily. For example, you could add a mapping for normal mode and another for visual mode:
nnoremap <leader>gq :%!pandoc -f html -t markdown | pandoc -f markdown -t html<CR> vnoremap <leader>gq :!pandoc -f html -t markdown | pandoc -f markdown -t html<CR>
That’ll work, but I want to suggest a way of doing it without leader mappings.
Set up formatprg to filter selection through pandoc
In episode 18 of Vimcasts, I demonstrated how the external
par command could be used for the task of formatting plain text files with hard-wrapping.
We could use a similar technique here.
if has("autocmd") let pandoc_pipeline = "pandoc --from=html --to=markdown" let pandoc_pipeline .= " | pandoc --from=markdown --to=html" autocmd FileType html let &formatprg=pandoc_pipeline endif
That means we can filter the current line through pandoc by pressing
Or we can filter the entire buffer by pressing
Or we can switch to visual mode, and
gq will filter only the selected lines.
Update: use `formatexpr` instead
As domo pointed out in the comments, the
formatprg setting can only be set globally. That’s unfortunate! Kana suggested using
formatexpr instead, so I came up with this alternative:
function! FormatprgLocal(filter) if !empty(v:char) return 1 else let l:command = v:lnum.','.(v:lnum+v:count-1).'!'.a:filter echo l:command execute l:command endif endfunction if has("autocmd") let pandoc_pipeline = "pandoc --from=html --to=markdown" let pandoc_pipeline .= " | pandoc --from=markdown --to=html" autocmd FileType html setlocal formatexpr=FormatprgLocal(pandoc_pipeline) endif
However, this approach has a setback of its own. According to Sung Pae in the discussion for this patch to convert formatprg to a global-local option:
is called on every keystroke in Insert mode, even whenformatoptions
is empty. This suggests to me that the primary purpose offormatexpr
is for fine-grained control over automatic formatting, rather than to be a simple, on demand paragraph formatter likefmt
Sung Pae’s patch makes
formatprg global-local, which means that it could be configured independently for differnt filetypes. It would be great if that patch could be folded into core Vim!