Using external filter commands to reformat HTML

#64

Run time:

We can use pandoc as a filter to clean up WYSIWYG-generated HTML. Pandoc is a commandline program, but we can call it from inside Vim either using the bang Ex command, or by configuring the formatprg option to make the gq operator invoke pandoc.

Shownotes

I recently saw a neat trick on Twitter from Stephen Hay, who says that:

Few things clean up CMS-input HTML better than running it through Pandoc to convert to Markdown and then back to HTML again. 1 sec, big win.

pandoc is a swiss-army knife for converting between all sorts of markup formats. You can find installation instructions on the pandoc site.

Suppose we have a tea-dance.html file that contains crufty markup, because it was generated by a WYSIWYG editor. We could clean it up by running this at the command line:

cat tea-dance.html | pandoc --from=html --to=markdown | pandoc --from=markdown --to=html

This emits a cleaned up version of tea-dance.html on standard out.

We’re using pandoc as a filter, that is: a program “that accepts text at standard input, changes it in some way, and sends it to standard output”.

Running text from a Vim buffer through an external filter

Suppose that we open the tea-dance.html file in Vim. We can use the bang Ex command to filter the contents of the current buffer through our pandoc pipeline:

:%!pandoc --from=html --to=markdown | pandoc --from=markdown --to=html

Vim will take the output from that pipeline and use it to overwrite the original text from the buffer.

In a followup tweet, Stephen suggests mapping this Ex command to a key so we can run it more easily. For example, you could add a mapping for normal mode and another for visual mode:

nnoremap <leader>gq :%!pandoc -f html -t markdown | pandoc -f markdown -t html<CR>
vnoremap <leader>gq :!pandoc -f html -t markdown | pandoc -f markdown -t html<CR>

That’ll work, but I want to suggest a way of doing it without leader mappings.

Set up formatprg to filter selection through pandoc

In episode 18 of Vimcasts, I demonstrated how the external par command could be used for the task of formatting plain text files with hard-wrapping. As long as we’re using Vim version 8.0.0179 (or newer), we can use a similar technique here.

The gq operation runs the selected text through the filter specified by formatprg. This autocommand sets formatprg for HTML files to use our pandoc pipeline:

if has("autocmd")
  let pandoc_pipeline  = "pandoc --from=html --to=markdown"
  let pandoc_pipeline .= " | pandoc --from=markdown --to=html"
  autocmd FileType html let &l:formatprg=pandoc_pipeline
endif

That means we can filter the current line through pandoc by pressing gqq. Or we can filter the entire buffer by pressing gg then gqG. Or we can switch to visual mode, and gq will filter only the selected lines.

Update: When I originally published this episode, I assumed that the formatprg option could be set for each buffer independently. I was wrong then, but this is now possible since this patch by Sung Pae was accepted into Vim core.

Further reading

Comments

Level-up your Vim

Training

Boost your productivity with a Vim training class. Join a public class, or book a private session for your team.

Drew hosted a private Vim session for the shopify team that was one of the best workshops I have ever attended.

John Duff, Director of Engineering at Shopify

Publications

Make yourself a faster and more efficient developer with the help of these publications, including Practical Vim (Pragmatic Bookshelf 2012), which has over 50 five-star reviews on Amazon.

After reading it, I've switched to vim as my default editor on a daily basis with no regrets. ★★★★★

Javier Collado

Learn to use Vim efficiently in your Ruby projects

In association with thoughtbot, one of the most well respected Rails consultancies in the world, I've produced a series of screencasts on how to make navigating your Ruby projects with Vim ultra-efficient. Along the way, you’ll also learn how to make Ruby blocks a first-class text object in Vim. This lets you edit Ruby code at a higher level of abstraction. Available to buy from thoughtbot..