Using external filter commands to reformat HTML


Run time:

We can use pandoc as a filter to clean up WYSIWYG-generated HTML. Pandoc is a commandline program, but we can call it from inside Vim either using the bang Ex command, or by configuring the formatprg option to make the gq operator invoke pandoc.


I recently saw a neat trick on Twitter from Stephen Hay, who says that:

Few things clean up CMS-input HTML better than running it through Pandoc to convert to Markdown and then back to HTML again. 1 sec, big win.

pandoc is a swiss-army knife for converting between all sorts of markup formats. You can find installation instructions on the pandoc site.

Suppose we have a tea-dance.html file that contains crufty markup, because it was generated by a WYSIWYG editor. We could clean it up by running this at the command line:

cat tea-dance.html | pandoc --from=html --to=markdown | pandoc --from=markdown --to=html

This emits a cleaned up version of tea-dance.html on standard out.

We’re using pandoc as a filter, that is: a program “that accepts text at standard input, changes it in some way, and sends it to standard output”.

Running text from a Vim buffer through an external filter

Suppose that we open the tea-dance.html file in Vim. We can use the bang Ex command to filter the contents of the current buffer through our pandoc pipeline:

:%!pandoc --from=html --to=markdown | pandoc --from=markdown --to=html

Vim will take the output from that pipeline and use it to overwrite the original text from the buffer.

In a followup tweet, Stephen suggests mapping this Ex command to a key so we can run it more easily. For example, you could add a mapping for normal mode and another for visual mode:

nnoremap <leader>gq :%!pandoc -f html -t markdown | pandoc -f markdown -t html<CR>
vnoremap <leader>gq :!pandoc -f html -t markdown | pandoc -f markdown -t html<CR>

That’ll work, but I want to suggest a way of doing it without leader mappings.

Set up formatprg to filter selection through pandoc

In episode 18 of Vimcasts, I demonstrated how the external par command could be used for the task of formatting plain text files with hard-wrapping. We could use a similar technique here.

The gq operation runs the selected text through the filter specified by formatprg. This autocommand sets formatprg for HTML files to use our pandoc pipeline:

if has("autocmd")
  let pandoc_pipeline  = "pandoc --from=html --to=markdown"
  let pandoc_pipeline .= " | pandoc --from=markdown --to=html"
  autocmd FileType html let &formatprg=pandoc_pipeline

That means we can filter the current line through pandoc by pressing gqq. Or we can filter the entire buffer by pressing gg then gqG. Or we can switch to visual mode, and gq will filter only the selected lines.

Update: use `formatexpr` instead

As domo pointed out in the comments, the formatprg setting can only be set globally. That’s unfortunate! Kana suggested using formatexpr instead, so I came up with this alternative:

function! FormatprgLocal(filter)
if !empty(v:char)
  return 1
  let l:command = v:lnum.','.(v:lnum+v:count-1).'!'.a:filter
  echo l:command
  execute l:command

if has("autocmd")
  let pandoc_pipeline  = "pandoc --from=html --to=markdown"
  let pandoc_pipeline .= " | pandoc --from=markdown --to=html"
  autocmd FileType html setlocal formatexpr=FormatprgLocal(pandoc_pipeline)

However, this approach has a setback of its own. According to Sung Pae in the discussion for this patch to convert formatprg to a global-local option:

formatexpris called on every keystroke in Insert mode, even when formatoptionsis empty. This suggests to me that the primary purpose of formatexpris for fine-grained control over automatic formatting, rather than to be a simple, on demand paragraph formatter likefmtorpar`

Sung Pae’s patch makes formatprg global-local, which means that it could be configured independently for differnt filetypes. It would be great if that patch could be folded into core Vim!

Further reading


Level-up your Vim


Boost your productivity with a Vim training class. Join a public class, or book a private session for your team.

Drew hosted a private Vim session for the shopify team that was one of the best workshops I have ever attended.

John Duff, Director of Engineering at Shopify


Make yourself a faster and more efficient developer with the help of these publications, including Practical Vim (Pragmatic Bookshelf 2012), which has over 50 five-star reviews on Amazon.

After reading it, I've switched to vim as my default editor on a daily basis with no regrets. ★★★★★

Javier Collado

Learn to use Vim efficiently in your Ruby projects

In association with thoughtbot, one of the most well respected Rails consultancies in the world, I've produced a series of screencasts on how to make navigating your Ruby projects with Vim ultra-efficient. Along the way, you’ll also learn how to make Ruby blocks a first-class text object in Vim. This lets you edit Ruby code at a higher level of abstraction. Available to buy from thoughtbot..