Pandoc's Markdown is great!

12.07.17    markup

This post is a shout-out to the great tool that is Pandoc. It is a really nice document converter! It reads many different markup languages and document formats and can output even more. Using Pandoc documents written in markup languages (languages often designed to be written by humans) can be transformed to the specific format a particular tool - generally with minimal friction.

Input

Pandoc supports many different markup languages as input. This makes it a very versatile tool. It compiles them transparently using the fact that they all share similar ideas of what constitutes a document.

Markdown

A very important input format for Pandoc is Markdown (particularly Pandoc's own extension to markdown). Markdown is a rather simple format. It supports the typical ways of structuring and styling: headers, paragraphs, lists etc. But not much more!

This way the actual markup can be minimal and easily readable in source form. Check out the different cheat sheets for the syntax of Markdown!

Additionally Pandocs enhances its Markdown with various extensions. These allow for example creating tables, footnotes and citations with bibliographies. These nicely work with the minimal and efficient core of "standard" Markdown.

One notable extension is math mode. Pandoc recognizes text between dollar signs $like this$. It then interprets the contents as a TeX math mode formula. If the -m command line option is given when translating markdown to HTML, the output will contain the necessary scripts to display the formula nicely in browsers. (The exact way this is achieved can be controlled using command line options.) Display math (between $$dobule dollar signs$$) works accordingly.

This allows you to keep using the simple Markdown syntax and structure even if you need nice formula typesetting. It also makes creating nice formulas for the web accessible in a user friendly way.

Other formats

Of course other formats are supported as well. These include HTML and (La-)TeX. Necessarily, converting from a powerful format to a simpler one means loosing some information!

Output

When creating the target output Pandoc normally creates simple code, which directly represents the contents of the input. This does not necessarily mean that the output is valid as a standalone document. For example when emitting HTML, Pandoc will not generate <html>, <head>, and <body> tags.

The -s option (standalone) makes Pandoc generate these full documents. For this purpose a set of standard templates is employed, but you can also create your own. When generating these full documents the -H, -B, -A options allow you to inject code into the head section, before the content and after the content respectively. These can be used for example to add references to style sheets in HTML, load packages in LaTeX or add a footer text.

An interesting feature of Pandoc is the possibility to embed target code inside your source code. When generating HTML from Markdown, tags are left as they are. This allows you to express more complicated parts of the document using HTML. This feature also works with LaTeX. You can add \texcommands{} into Markdown which will stay in the LaTeX code so that your LaTeX compiler sees them! This enables you to enjoy Markdown features like paragraphs split by empty lines (instead of <p></p>) and still use complicated HTML tags (likes <span>s with custom classes) in one document.

Conclusion

Pandoc allows you to avoid the complexities of powerful markup languages when simple ones suffice. With the power of Pandoc, tools which understandably work with one specific format natively, now work with all of them: pdflatex creates nicely typeset PDFs from LaTeX; by going through pandoc it also creates them from Markdown! The same Markdown can be converted to HTML and put into the Web. Pandoc makes this easy!