Manipulating Markdown checkboxes with Regexes in Vim

Regular expressions - commonly known as Regex - are a powerful tool for searching (and replacing) text. They are based on the mathematical theory of regular languages¹. Many tools employ variants of them to automate text based tasks. Here we want to use them to interact with checkboxes like [x] when editing Markdown in Vim.

We want to extend Vim's built-in functions with two new ones:

To create an empty checkbox [ ] in the current line. If there is one already we, want to replace it with the empty checkbox. If the line is a Markdown list entry (i.e. starts with * or -), we want to place the checkbox after the list marker.
To create a crossed checkbox [x]. This should behave like the empty one.

So if you have text like this:

- this is a text describing a task

Then you activate the first function and you get:

- [ ] this is a text describing a task

And once you are done you activate the second function and the box gets ticked:

- [x] this is a text describing a task

The power of Vim and Regexes allows to express these with one line each.

A small intro about Regexes

Regular expressions are patterns - blueprints for text. A Regex describes a set of texts which match that pattern. The core syntax for Regexes is mostly the same among Unix tools: letters and numbers represent themselves, . stands for any character, * allows the character before to occur any amount of time (including zero times). So the pattern .* matches any text including the empty string and a.* matches any text starting with a. Other characters may match themselves or have a special function (depending on the particular variant of Regex).

By prepending the backslash \ you can escape characters: if they have a special function, they will match themselves instead. Otherwise they might gain a special function. . matches any characters; \. matches a literal dot .. The pattern s matches s, while \s matches any white space character like space or tab. And \\ matches backslash itself. In Vim's variant of Regex many special functions of symbols are activated only with \. This is different to many other tools which use the Regex variant PCRE².

The pattern

Now we construct our pattern. Let's start by matching a box:

\[.*\]

We have to escape the [ and ] because otherwise they will express a character class. Normally [abc] will match a, b, or c. If it starts with ^ the class is inverted: [^abc] matches any one character but a, b, or c. We can use this to improve our matching of the checkbox:

\[[^\]]*\]

So we want a literal [ followed by any character that is not ] ( matched by [^\]] ), any number of times, followed by a literal ]. The new pattern correctly fails to match [xy]yz]. It will just match the [xy] part.

We want to eventually replace this with [ ] or [x] but the command should also work if there is no checkbox at the start of the line. If there is none, we want to insert it, which is equivalent to replacing an empty string with the inserted text. The construct \( ... \)\? matches zero to one occurrence of the pattern between the brackets. So in our next step:

\(\[[^\]]*\] \)\?

We match a checkbox if there is one or otherwise the empty string. Notice the space before the right closing bracket: we avoid matching a Markdown link [link](url).

Now we have to specify where we want to find this box. The checkbox is at the start of a line. For this we can use ^ which matches the start of a line (in Vim and many other tools). After that there might be an arbitrary amount of white space. We match it using \s*. After the start of the actual line there might be a * or - indicating a list item. We use the \| alternative operator because we expect either one symbol or the other.

^\s*\(- \|\* \)\?\(\[[^\]]*\] \)\?.

Notice that we escape \* because we want to match a * itself. We also add a dot at the end to avoid matching a completely empty line.

So in the end this matches: the start of a line ^ followed by some white space \s*, then one of two possible list markers \(- \|\* \) which are optional \?. After that we expect a opening bracket \[ with any number of non-] characters followed by a closing bracket \] and a space, all of which is again optional. And then just some arbitrary character . to avoid matching an empty line.

The substitute command

Now the core pattern is complete: we can find the place for our checkbox. The next step is to insert the checkbox or replace the already existing one. For this we use the substitute command :s/pattern/text/. In the case of the of [ ] it looks like this:

:s/^\s*\(- \|\* \)\?\zs\(\[[^\]]*\] \)\?\ze./[ ] /

This will insert/replace the checkbox in the line your cursor is in Vim at the moment. Normally the :s command replaces the whole matching part of the line. This does not fit our use case. We just want to replace the potential checkbox found in the second part of our pattern. That is why we have inserted the \zs and \ze markers into our Regex. They do not influence the matching itself but they mark the start and end of the part we want to replace. If an existing checkbox is found it is replaced with the empty one. Otherwise the empty string at the correct location of the line is replaced with the empty checkbox effectively inserting it there.

The mapping

Now we want to create a mapping: We want to bind a key in Vim to the substitution command we defined in the previous paragraph. Vim has the map keys1 keys2 family of commands for this purpose. It maps one combination of key presses to another. If you type the combination of the first operand Vim does what the second operand says.

First we have to select our variant of the map command. We want the mapping to apply to normal mode (n-). And we don't want the mapping engine to recursively analyze our output and instead directly invoke Vim's core functionality (-nore-). Thus we choose the nnoremap command.

Map commands command do not care about the meaning of the keys at all. They operate on the level of individual key presses transparently. This is a simple yet extremely powerful approach, but it also makes the usage somewhat cumbersome. To run a command for example you have to send the final Enter key at the end. Special keys like this have to be encoded into the map command. This is where the escaping gets a little bit crazy. We have to replace the spaces with <space>, add the Enter key <CR> at the end of our substitution command, and have to escape the backslashes already in the substitution command with backslashes themselves. As the left-hand-side we arbitrarily choose the F2 key:³

nnoremap <F2> :s/^\s*\(-<space>\\|\*<space>\)\?\zs\(\[[^\]]*\]<space>\)\?\ze./[<space>]<space>/<CR>

We can add a simple 0t]⁴ at the end of our mapping, so the cursor ends at the inside of the checkbox. This way the user can edit the contents of the checkbox if they want to. We can also add the same mapping in visual mode (using vnoremap) and it will work as expected (a nice side effect of Vim's modal nature): When the user selects multiple lines in visual mode and then invokes the command, all non-empty lines will be converted to checkbox lines and existing checkboxes changed! By adding <silent> we hide the command from the user so it does not distract them. And of course we add the corresponding mapping for the crossed-out checkbox:

nnoremap <silent> <F2> :s/^\s*\(-<space>\\|\*<space>\)\?\zs\(\[[^\]]*\]<space>\)\?\ze./[<space>]<space>/<CR>0t]
nnoremap <silent> <F3> :s/^\s*\(-<space>\\|\*<space>\)\?\zs\(\[[^\]]*\]<space>\)\?\ze./[x]<space>/<CR>0t]
vnoremap <silent> <F2> :s/^\s*\(-<space>\\|\*<space>\)\?\zs\(\[[^\]]*\]<space>\)\?\ze./[<space>]<space>/<CR>0t]
vnoremap <silent> <F3> :s/^\s*\(-<space>\\|\*<space>\)\?\zs\(\[[^\]]*\]<space>\)\?\ze./[x]<space>/<CR>0t]

This code can be added into the .vimrc. It creates the mappings which allow the user to easily and transparently interact with checkboxes in Markdown files. All of this is made possible by the power of Vim and Regex.

A regular language is particularly simple: a finite automaton can decide if a text belongs to such a language. All regular languages can be expressed by a set simple building blocks and compositions. Regexes in computing are based on this idea but often have extended capabilities moving beyond the neat mathematical properties of regular languages. ↩
Perl-compatible regular expressions. This is the most common Regex variant found in most programming language libraries. In PCRE most punctuation characters have a special meaning. As an example you use ( and ) for grouping instead of \( and \) in Vim. But except for the need of the backslash the special function of many symbols is the same between PCRE and Vim's Regexes. ↩
Using a global mapping of F2 here is alright for .vimrc. But it is not good practice for a potential Vim plugin. Instead the mapping should be configurable and probably only active for specific file types. This way you avoid cluttering the command space. ↩
0t] is a combination of two commands. 0 jumps to the very start of the current line and then t] finds the next ] and jumps right before it into the box. ↩