MediaWiki Hack; Automatic API links in source code

The last few days I’ve been working on setting up a MediaWiki installation for the libxml2 project. One thing that I wanted to do was add a tag extension that would allow editors to write e.g. <api>xmlNode</api> and get a link to the API documentation for the xmlNode structure at the libxml2 web site. This turned out to be trivial enough and I soon had it working.

One small problem arised though. I installed the SyntaxHighlight_GeSHi extension to allow source code to be marked up and highlighted using <source />, and my <api /> tag extension would not work inside a <source /> as another extension was already in effect. Not only that; even if it had worked, having to mark up all occurances of libxml2 API symbols inside source code would be cumbersome and it would be hard to discern the source code when editing the wiki markup with all those <api /> tags littered about.

So I figured I’d just patch the SyntaxHighlight_GeSHi extension to do automatic identification and linkifying of libxml2 API symbols instead. It turned out to be easier than I thought. The API symbols for libxml2 can be obtained as an XML file by running the doc/ script in the libxml2 source tree. This file has the following format (condensed example excerpt):


So I first made the following PHP script that takes this XML and generates a PHP associative array definition with [symbol] => [url] mappings.

 url, ... ).
 * Usage: php

$apibase = ''; // The base for the API URLs.

// Open input
$xml = new XMLReader();

if ($xml == FALSE) {
  echo "Unable to open libxml2-api.xml\n";

$file = NULL;
$symbol = NULL;
$type = NULL;

echo "read()) {
  if ($xml->nodeType == XMLReader::ELEMENT && $xml->name == 'file')
      if ($xml->name == 'name')
        $file = $xml->value;
  if ($xml->nodeType == XMLReader::ELEMENT && $xml->name == 'exports')
      if ($xml->name == 'symbol')
        $symbol = $xml->value;
      elseif ($xml->name == 'type')
        $type = $xml->value;
  if ($file != NULL && $symbol != NULL && $type != NULL) {
    echo "  \"$symbol\" => \"$apibase/libxml-$file.html#$symbol\",\n";
    $symbol = NULL;
echo ");\n\n?>";


I ran this script against the libxml2-api.xml from libxml2 2.7.2 and put the output into extensions/Libxml2ApiSymbols.php. Then it was just a matter of patching the SyntaxHighlight_GeSHi/SyntaxHighlight_GeSHi.class.php slightly to make it do automatic detection and linkifying of API symbols. I also took the opportunity to patch it to default to the C language when highlighting, as it will be the most common one at the libxml2 wiki. Below is the full diff.

--- SyntaxHighlight_GeSHi.class.php.orig        2008-11-11 17:13:09.000000000 +0100
+++ SyntaxHighlight_GeSHi.class.php     2008-11-11 17:56:36.000000000 +0100
@@ -1,5 +1,7 @@
 mOutput->addHeadItem( self::buildHeadItem( $geshi ), "source-{$lang}" );
+      // Add libxml2 API links
+      if ($lang == 'c')
+        foreach ($libxml2ApiSymbols as $symbol => $url) {
+          $out = preg_replace("/$symbol([^a-zA-Z0-9_])/", "$symbol$1", $out);
+        }
                        if ( $enclose === GESHI_HEADER_NONE ) {
                                return ' '.$out . '';
                        } else {

The last thing I did was a bit of styling of the links in the MediaWiki:Geshi.css:

a.libxml2-symbol {
  color: #a06060;
  text-decoration: none;
a.libxml2-symbol:visited {
  color: #a06060;

The result can be seen here. Hope someone finds this useful. If anyone feels like helping out with the wiki by the way; please register and edit away!