mediawiki libxml2

You are currently browsing articles tagged mediawiki libxml2.

The last few days I’ve been working on setting up a MediaWiki installation for the libxml2 project. One thing that I wanted to do was add a tag extension that would allow editors to write e.g. <api>xmlNode</api> and get a link to the API documentation for the xmlNode structure at the libxml2 web site. This turned out to be trivial enough and I soon had it working.

One small problem arised though. I installed the SyntaxHighlight_GeSHi extension to allow source code to be marked up and highlighted using <source />, and my <api /> tag extension would not work inside a <source /> as another extension was already in effect. Not only that; even if it had worked, having to mark up all occurances of libxml2 API symbols inside source code would be cumbersome and it would be hard to discern the source code when editing the wiki markup with all those <api /> tags littered about.

So I figured I’d just patch the SyntaxHighlight_GeSHi extension to do automatic identification and linkifying of libxml2 API symbols instead. It turned out to be easier than I thought. The API symbols for libxml2 can be obtained as an XML file by running the doc/apibuild.py script in the libxml2 source tree. This file has the following format (condensed example excerpt):

<?xml version="1.0" encoding="ISO-8859-1"?>
<api name='libxml2'>
  <files>
    <file name='DOCBparser'>
      <exports symbol='docbParserInputPtr' type='typedef'/>
      <exports symbol='docbParserCtxt' type='typedef'/>
      ...
    </file>
    <file name='HTMLparser'>
      <exports symbol='htmlDefaultSubelement' type='macro'/>
      <exports symbol='htmlElementAllowedHereDesc' type='macro'/>
      ...
    </file>
    ...
  </files>
</api>

So I first made the following PHP script that takes this XML and generates a PHP associative array definition with [symbol] => [url] mappings.

<?php
/*
 * This script reads 'libxml-api.xml' from the current directory
 * generates a PHP array definition on standard output.
 *
 * The array will have the format (symbol => url, ... ).
 *
 * Usage: php
 */
 
$apibase = 'http://xmlsoft.org/html'; // The base for the API URLs.
 
// Open input
$xml = new XMLReader();
$xml->open('libxml2-api.xml');
 
if ($xml == FALSE) {
  echo "Unable to open libxml2-api.xml\n";
  exit(FALSE);
}
 
$file = NULL;
$symbol = NULL;
$type = NULL;
 
echo "<?php\n\n\$libxml2ApiSymbols = array (\n";
while ($xml->read()) {
  if ($xml->nodeType == XMLReader::ELEMENT && $xml->name == 'file')
    while($xml->moveToNextAttribute())
      if ($xml->name == 'name')
        $file = $xml->value;
  if ($xml->nodeType == XMLReader::ELEMENT && $xml->name == 'exports')
    while($xml->moveToNextAttribute())
      if ($xml->name == 'symbol')
        $symbol = $xml->value;
      elseif ($xml->name == 'type')
        $type = $xml->value;
  if ($file != NULL && $symbol != NULL && $type != NULL) {
    echo "  \"$symbol\" => \"$apibase/libxml-$file.html#$symbol\",\n";
    $symbol = NULL;
  }
}
echo ");\n\n?>";
 
$xml->close();
?>

I ran this script against the libxml2-api.xml from libxml2 2.7.2 and put the output into extensions/Libxml2ApiSymbols.php. Then it was just a matter of patching the SyntaxHighlight_GeSHi/SyntaxHighlight_GeSHi.class.php slightly to make it do automatic detection and linkifying of API symbols. I also took the opportunity to patch it to default to the C language when highlighting, as it will be the most common one at the libxml2 wiki. Below is the full diff.

--- SyntaxHighlight_GeSHi.class.php.orig        2008-11-11 17:13:09.000000000 +0100
+++ SyntaxHighlight_GeSHi.class.php     2008-11-11 17:56:36.000000000 +0100
@@ -1,5 +1,7 @@
 <?php
 
+require_once('extensions/Libxml2ApiSymbols.php');
+
 class SyntaxHighlight_GeSHi {
 
        /**
@@ -21,6 +23,7 @@
         * @return string
         */
        public static function parserHook( $text, $args = array(), $parser ) {
+    global $libxml2ApiSymbols;
                self::initialise();
                $text = rtrim( $text );
                // Don't trim leading spaces away, just the linefeeds
@@ -29,7 +32,8 @@
                if( isset( $args['lang'] ) ) {
                        $lang = strtolower( $args['lang'] );
                } else {
-                       return self::formatError( htmlspecialchars( wfMsgForContent( 'syntaxhighlight-err-language' ) ) );
+      $lang = 'c'; // Default to C language
+                       //return self::formatError( htmlspecialchars( wfMsgForContent( 'syntaxhighlight-err-language' ) ) );
                }
                if( !preg_match( '/^[a-z_0-9-]*$/', $lang ) )
                        return self::formatError( htmlspecialchars( wfMsgForContent( 'syntaxhighlight-err-language' ) ) );
@@ -67,6 +71,11 @@
                                $out = str_replace( "\n", '', $out );
                        // Register CSS
                        $parser->mOutput->addHeadItem( self::buildHeadItem( $geshi ), "source-{$lang}" );
+      // Add libxml2 API links
+      if ($lang == 'c')
+        foreach ($libxml2ApiSymbols as $symbol => $url) {
+          $out = preg_replace("/$symbol([^a-zA-Z0-9_])/", "<a class=\"libxml2-symbol\" href=\"$url\">$symbol</a>$1", $out);
+        }
                        if ( $enclose === GESHI_HEADER_NONE ) {
                                return '<span class="'.$lang.' source-'.$lang.'"> '.$out . '</span>';
                        } else {

The last thing I did was a bit of styling of the links in the MediaWiki:Geshi.css:

a.libxml2-symbol {
  color: #a06060;
  text-decoration: none;
}
a.libxml2-symbol:visited {
  color: #a06060;
}

The result can be seen here. Hope someone finds this useful. If anyone feels like helping out with the wiki by the way; please register and edit away!