The Problem

Introduction

Documenting code is notoriously hard to do well.

Many years ago now, Donald Knuth introduced the idea of literate programming, where code essentially exists as marked up snippets inside an essay that expounded the algorithms being developed by the programmer.

Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.

— Donald Knuth

Knuth’s emphasis was on the essay part of the puzzle. Very fitting for a distinguished author such as himself with numerous seminal and beautifully produced textbooks to his name.

However, most programmers aren’t all that good at writing a small number of lines of code, let alone a long form exposition of some sort!

Nevertheless, the idea of code as an essay lives on and flourishes these days in many popular environments like Jupyter notebooks, Mathematica notebooks, Maple sheets etc. There are even collaborative tools that allow a group of authors to work on a common notebook without clobbering each other’s work. However, Knuth’s idea of one source both getting tangled into efficient code for optimal execution and also woven into beautiful documentation gets lost in those systems. In the hands of good authors, notebooks can be great at exposition but they generally do not compose well into larger systems.

Code libraries of one form or another remain the main tools that are deployed to build those larger systems. And libraries are really only widely used if they are properly documented.

Most coders do learn to occasionally plop a comment line or two into their code. There have been various attempts to standardize the format of those code comments with the aim of easily extracting them into a form of readable documentation.

At first blush, this seems like a reasonable goal. The library author concentrates for the most part on writing code but she also learns a little extra syntax which is used to slightly enrich and, more importantly, standardize the way comments are written—particularly those at, or near, the start of major code blocks like classes, methods, functions, etc. The compiler ignores all comments anyway so a separate tool extracts the specially formatted comments and uses those, perhaps along with other information from the compiler, to automatically generate documentation.

The emphasis here is the opposite of Knuth’s. You mainly write code and hopefully embed enough comments to document the intent and use of that code. The main advantage is that the comments live right in the source itself so have a meaningful chance of remaining up to date and relevant as the code evolves.

Tool Tips

One advantage of using specially marked up comments as documentation is that modern code editors will parse them and make them available when a programmer imports a library module or header file. She can then just hover over a library provided function call and get a nice tool tip to remind her of what the relevant arguments should be etc.

This tool tip use of formatted comments is by itself quite useful and worth the price of admission. The library writer provides some brief standardized comments which are not too onerous to create. For example, in the class GF2::Matrix we have a identity(…) method that begins as follows:

/// @brief Factory method to generate the identity bit-matrix.
/// @param n The size of the matrix to generate.
/// @return An `n x n` bit-matrix with 1's on the diagonal and 0's everywhere else.
static constexpr Matrix identity(std::size_t n) {
    ...
}

Here we are using comments marked up in one of the formats understood by one of the best know documentation generators Doxygen.

Doxygen is itself a sizable system and one with a large number of possible directives. However, if you just want to provide some useful tool tips you don’t need to understand anything about Doxygen at all (or even have it installed) and really only need to know a handful of fairly obvious directives—the three used above @brief, @param, and @return already cover a lot of ground and their purposes are pretty obvious!

This works because parsing trivial Doxygen markup is built in to most modern code environments either directly or through a plugin. For example, if you are editing code that calls that method in VSCode, hovering over the appropriate spot produces a tool tip like:

Figure 1. A Sample Tool Tip

The tip is slightly wordy because GF2::Matrix is a template class (hence that <Block, Allocator> reference) but nonetheless, it does an adequate job of explaining what’s going on.

This particular method is so simple that it might be argued that any documentation at all is a bit of an overkill but we took the view it is better to be consistent and to provide this minimal level of documentation in-source for all public methods and functions in our library.

However, good documentation goes a lot further than just providing tool tips.

Introductory material may be needed, the rationale for a particular implementation given, gotchas enumerated, fallbacks enumerated if preconditions are not met, etc. Above all, short focused examples showing how a method might be used and what it might return are generally really useful for the end-user of a library. This is the material you expect to see in a professional library’s long form documentation site all nicely laid out and correctly linked together in a user friendly format.

Doxygen was built to handle all of that and there are projects which use it. However, putting all that material into the source code itself just seems clunky and pushes code commenting a bit too far. Apart from anything else, the number of comment lines balloons to the point where it is hard to scan the code itself. Writing a lot of material as comment blocks isn’t all that pleasant an experience either though there are plugins for editors that attempt to ease the pain.

Of course Doxygen can link in documents that are independent of the source code but at that point you start to lose the benefit of having one source for the code and its documentation.

The C++ Standard Library

If you browse through the headers of the standard library on your computer (or more likely get sucked into looking at one of those headers when your IDE finds an error in your code that leads you down to the dungeon) you quickly realize that these pearls of computer wisdom are nigh on unreadable!

The only sizable comments in the headers are often limited to some legal gobbledygook pertaining to licensing at the start of the file. Variables are tersely named and festooned with underscores that hurt the eye (those visually unappealing decorations are there to stop you shooting yourself in the foot by some inadvertent misuse of the preprocessor). Even if you can get past those festering carbuncles you will still likely be confounded by the sheer generality of the code which is expected to handle obscure corner cases and ancient computer architectures. That is all part of the genius & generality of the library but it certainly doesn’t make for a good read!

Of course if you are happily typing away in any reasonable IDE you will remain oblivious to the warts and instead get excellent tool tips for all standard library objects and calls. However, those do not come from comments embedded in the header files but instead are provided using some other mechanism. Your IDE will probably also provide links to really good long form documentation for the standard library complete with all the goodies we mentioned earlier.

For C++ developers, one long form documentation site really stands out, namely CPP Reference. To quote from their FAQ, the purpose of the site is as follows.

Our goal is to provide programmers with a complete online reference for the C and C++ languages and standard libraries, i.e. a more convenient version of the C and C++ standards. The primary objective is to have a good specification of C and C++. That is, things that are implicitly clear to an experienced programmer should be omitted, or at least separated from the main description of a function, constant or class. A good place to demonstrate various use cases is the “example” section of each page. Rationale, implementation notes, domain specific documentation are preferred to be included in the “notes” section of each page.

Quality is valued over quantity so the site sticks to just documenting the standard library, tempting though it might be to add things like some of the most popular Boost libraries.

To be fair, the site has a very particular look which might not be to everyone’s taste. For one thing, it somehow doesn’t look all that “modern” as there is a lot of text on every page all in a single undistinguished typeface ( DejaVuSans ) and to boot, some of that text is in quite small point sizes.

Nevertheless, the quality of the documentation is excellent and the writing style and format is consistent across the entire large project which is quite the task given that the whole thing is a community effort (it makes me suspect that a huge amount of the material is actually written or at least heavily edited by a relatively small number of contributors).

In many ways CPP Reference sets the gold standard for documentation of a C++ library. So emulating it would seem like a reasonable approach to documenting a new library. Achieving a comparable writing style is of course up to the author but unfortunately mirroring the page format turns out to be quite difficult.

CPP Reference is a wiki built using the same MediaWiki software that powers Wikipedia. While the basic markup used isn’t all that hard, everything in the site is wrapped in “quite complex templates”. The editors basically suggest that a contributor just concentrates on writing some content and then let the experts take over to put the text into the correct format.

Moreover, the MediaWiki software was designed to be used in a browser which is very clunky and basic compared to using any fully featured modern editor.