Converting DokuWiki to MkDocs

We’ve been talking about migrating the MLUG website to another system for a while. This would reduce our hosting burden and open up editing to more people.

One promising option was using the free static hosting features of GitLab Pages. The website would be backed by a git repository of some markup and processed into HTML through some commit hooks.

I’ve performed a few experiments on and off for a good few months and took the liberty of creating a GitLab organistion to host the various pieces at https://gitlab.com/mlug

MkDocs is a pretty good candidate for the markup processing. It converts a tree of Markdown files fairly directly to HTML. It’s well maintained, easy to run, and Markdown is quite common.

The approach I took here was to convert the existing DokuWiki history into a git repository (so that we maintain a complete edit history), convert this DokuWiki markup to Markdown, then add in the require MkDocs configuration.

doku2git

There are a few scripts floating around that convert a DokuWiki directory into a git history. Unfortunately they had some reliability issues with some aspects of newer DokuWiki edit types, and did not retain the complete edit history for binary files (tending to commit them in one shot at the end).

So I wrote a small Python script to do the work for us: doku2git.

This script parses the history directories of an existing DokuWiki site and generates a shell script that recreates the history in a git repository. The final script mostly performs a long history of gunzip, git add, and git commit.

At this point we still have DokuWiki flavoured markup, but we have a copy of the complete edit history and can start updating things for MkDoc.

git2mkdocs

Now we need to convert the DokuWiki markup to Markdown. Thankfully this is almost entirely handled by a single invocation of Pandoc.

This gets us most of the way to a functional MkDocs conversion. However there are a few idiosyncracies in the way we’ve used DokuWiki, and the way DokuWiki handles links to files. eg,

  • Sometimes we use image links instead of file links
  • DokuWiki appears to be case insensitive
  • Some characters in file names appear to be stripped/changed by DokuWiki.

I wrote a (pretty terrible) shell script that runs pandoc followed by a bunch of regexs to perform this oneoff conversion at: https://gitlab.com/mlug/git2mkdocs

MkDocs

Now we have a directory of MarkDown (and binary files) that MkDocs can consume. At this point it’s mostly a matter of placing data in the expected locations, providing a minimal configuration file, and configuring MkDocs to run with a commit occurs.

You can find the repository at https://gitlab.com/mlug/mlug.gitlab.io and the trial conversion at https://mlug.gitlab.io/start/

The repository is pretty sizable, coming in at just under 300MB. But if you’re interested in taking a look at the various steps in processing they have been branched as:

dokuwiki
The doku2git output
markdown
The git2mkdocs output
pages
The minimal MkDoc changes for GitLab Pages

This won’t be the final output; I know there are a few bugs, and the history is around 9 months out of date at this point. If you’ve got some time to have a glance at the results and file any issues you encounter that’d be greatly appreciated.