An experiment in creating a portable, extensible system for formatting bibliographic citations and references

Peter Flynn

Silmaril Consultants

Version 0.3

April 2004

  1. Overview
  2. Requirements
  3. Implementation note
    1. Acknowledgements
    2. Terminology
  4. The MODS Schema
  5. The BiblioX files
    1. The BiblioX DTD
    2. The BiblioX stylesheet
    3. Driver files
    4. The formatting specification file
    5. Key files
    6. The Tools file
    7. CSS files
  6. The BiblioX user interface
    1. The formatting specification file
    2. Use with DocBook
    3. Use with TEI
    4. Samples
  7. Further work

BiblioX is an attempt to create an XML-based system for formatting bibliographic citations and references using XSLT. The recommended Document Type for storing bibliographic information is MODS, the Metadata Object Description Schema.1 Development is centered around this format, although examples in other formats are included for comparison. The Document Type used for storing this manual is DocBook.

BiblioX is an experimental or proof-of-concept system. It is not finished software, as it has bugs and missing bits, and should not be used for production processing.

  1. Library of Congress' Network Development and MARC Standards Office, Metadata Object Description Schema (December 2003)

ToC1 Overview

BiblioX is a set of XSLT templates and a file format for storing formatting information for bibliographic citations and references. It is similar in effect—although different in operation—to the BIBTeX system for use with LaTeX.

The advantage of BiblioX over other XML-based systems is that it works your documents in any DTD or Schema: you do not have to hard-code the element type names of your document markup vocabulary into the XSLT used by BiblioX for formatting. If you have existing bibliographic material in different formats (for example, DocBook or TEI), it can be processed just as easily as in MODS.

BiblioX uses a simple, visually-oriented formatting specification file with its own XML vocabulary. In this file you can set out exactly which components of a citation or a reference are printed or displayed, along with details of formatting (fonts, styles, sizes, etc), without any reference to any particular source document DTD or Schema (see Figure 1, ‘BiblioX formatting specification file’ below). All you need to provide is an indexing file which contains the equivalences or between the components in the formatting specification file and the specific DTD or Schema you are using, in the form of XSL keys.a

Figure 1: BiblioX formatting specification file
    <inline style="author-year">
      <author aftersep=" ">
	<name><substitute shape="italic" beforesep=" ">et al.</substitute></name>
      <date beforesep=" (" aftersep=")"><year></year></date>

    <reftype class="book" aftersep=".">
      <author aftersep=": ">
          <surname weight="bold"></surname>
          <forename beforesep=", "></forename>
        <name beforesep="; " aftersep=";">
          <surname beforesep=" "></surname>
        <name beforesep=" and ">
          <surname beforesep=" "></surname>
      <title shape="italic" aftersep="."></title>
      <publisher beforesep=" "></publisher>
      <place beforesep=", "></place>
      <date beforesep=", "></date>
      <isn type="ISBN" beforesep=", "></isn>
      <uri fonttype="monospace" beforesep=" [" aftersep="]"></uri>

Two examples from the generic.xml example formatting specification file, one showing a common author-year inline citation format, eg Walsh (1999), and the other showing a reference format for books [6], see ‘References’ below for details.

When BiblioX processes your main document, it looks up the formatting details for each citation or reference in the formatting specification file. It uses the components it finds there, to output the required formatting, using the index keys to look up back in the main document the textual values needed for each component (see Figure 2, ‘Schematic of BiblioX processing’ below).

Figure 2: Schematic of BiblioX processing

The BiblioX main stylesheet, a ‘driver’ file for your DTD, and a tools file of common routines contain all the XSLT code needed to do this. Output at the moment is restricted to HTML, but a LaTeX version is in preparation.

Normally, only one type of citation format is used in any given document, determined by the house style of the publisher or journal. The exception is in some documents in the Humanities, where the two alternatives, Walsh (1999) and (Walsh, 1999), are used according to the grammar of the surrounding text. In fact, BiblioX currently supports the unrestricted intermixing of different formats in any document.

  1. In practice it is hoped that publishers will provide formatting specification files which express their house style for journals, books, etc, and sets of key index files for popular DTDs, much in the same way that they currently provide BIBTeX files and LaTeX document classes for their formatting.

ToC2 Requirements

Apart from the BiblioX files, the only specific piece of software needed is an XSLT processor. These are freely available for download from many sources. The development environment uses Saxon from It is assumed that the user will already have an XML editor with which to write the main document.

Accurate data is critical to the working of any XML-based transformation. If the citation or reference information in your source document is inaccurate or poorly marked, you will almost certainly get poor results. In particular the essential components of a reference (author, title, date, and source) must be adequately marked up in a machine-interpretable form:

It is also recommended that you link your citations to the references via XML's ID/IDREF feature. This mechanism is a built into XML explicitly for cross-references such as bibliographic citations, and it is checked automatically by all validating parsers and editors to make sure you don't accidentally refer to something that doesn't exist.d Unfortunately some DTDs fail to provide ID or IDREF (or either) on the critical element types used for citation and reference. In this case, CDATA values can be used but it becomes the authors' manual responsibility to ensure that all citations bear a suitable value which is accessible in the references.

  1. Saxon is written in Java, so you must have Java installed to use it (see
  2. In the DocBook example, a CDATA attribute called YYYY-MM-DD has been added to the <date> element type for this purpose.
  3. Having a reference which is never cited, on the other hand, is not an XML error.

ToC3 Implementation note

The current release of this software is 0.3, which means it is experimental and incomplete. In particular the documentation is still being written. BiblioX was formerly known as Bibliofile: I changed the name to avoid conflicting with an entirely separate and unconnected project already called Bibliofile.

ToC3.1 Acknowledgements

This is a co-operative effort. The driving force behind the project has been Bruce D'Arcus, and a lot of the content relies on the work of Markus Hoenicka, Taco Hoekwater, Torsten Bronger, Karen Coyle, David Wilson, Eve Maler, Norm Walsh, and others whose document classes, DTDs, Schemas, style files, processing formats, and contributions to the discussions have improved my understanding of the problem.

There have been several ongoing and previous projects in the bibliographic storage and formatting field whose work has been invaluable in identifying problems, proposing solutions, and testing theories:

I must acknowledge the help of members of the XSL mailing list who kindly investigated some of my broken code, especially Jeni Tennison and Bob DuCharme. Thanks also go to David Carlisle and Markus Abt for identifying my ignorance of the effects of <xsl:for-each> and suggesting the use of <exsl:node-set>.

Ongoing discussions on the project take place on the BiblioX mailing list, which you can join at

ToC3.2 Terminology

Some terms used here may be unfamiliar to some readers.


A mention of a document or other object (including a direct quotation from a document) in support of an argument, made during discourse. The following are approximate guidelines: there are many variants.

In the Humanities

A citation is typically shown either as a superscripted number immediately after the mention or quotation,6 referring the user to a footnote which gives the author and date, possibly with a short title and page or section number; or as an inline citation (Flynn, 2004). The full list of References is printed in alphabetical or chronological order at the end of the document, without any numbering or labelling.

In the Natural Sciences

A citation is typically shown either as a number [2] or mnemonic abbreviated label in [square brackets] immediately after the mention or quotation [Fly04] (sometimes superscripted); extended inline citations as in the Humanities are also sometimes used. The number or the abbreviated label in square brackets refers the user directly to the numbered or labelled list of References at the end of the document, which may be in citation order or alphabetic order.

Special needs

In Law and some other disciplines there are some highly specific and very rigorous rules for citation. These are not addressed directly by BiblioX, and may require the addition of extra features in a later version. The reader is referred to the documentation in Berger (2002) and the associated mailing list for an extended discussion.


A sequence of details or metadata about a document or other object referred to in a Citation, sufficient for the reader to identify it uniquely, look it up in a library or database, or otherwise locate it. Typical details include:

This last item identifies the major distinction between monographs (free-standing works like books, whole journals, reports, Proceedings, etc) and individual contributions to them (articles, chapters, papers, etc). The latter are always referenced with the details of the greater enclosing work as well as the details of the individual contribution. (Hägglund, 1983)

Reference type

A class of References sharing the same general features and requiring the same typographic formatting.

The conventional monographic reference types include books, technical reports, journals, theses, plays, musical compositions, etc. Individual contribution types include articles, papers at a conference, chapters in a book, sections in a report, maps in a series, etc.

Two well-known sets of reference types are those used by the BIBTeX system[5] and those used by the DocBook DTD[6].


A Bibliox formatting specification for the lowest level of detail in a Reference.

This is roughly equivalent to a field in a database or an element in a DTD. A component names the type of data generically (eg ‘title’, ‘editor’), and provides typographic information about how it is to appear. A component can have subcomponents (eg the ‘name’ component is made up of ‘forename’ and ‘surname’ subcomponents). In a BiblioX formatting specification file, components are represented as element types from the BiblioX DTD. A component with no subcomponents is assumed to have a key entry in the keys file which identifies where to get the data from.

Main document

The user's XML document containing the text with citations and the list of bibliographic references which need formatting.

Formatting specification file

An XML document containing an instance of the BiblioX DTD which gives the formatting components for at least one group of reference types.

  1. Flynn, BiblioX (2004), p.4

ToC4 The MODS Schema

[Bruce to describe.]

ToC5 The BiblioX files

The BiblioX software is implemented entirely in XSLT. No additional software is needed apart from an XSLT processor. The one which has been used in testing is Saxon.

There is one possible dependency: the formatter may use the <exsl:node-set> function, which is included with Saxon but may not be in other processors. It can apparently be downloaded from but the copy acquired during testing was found to be defective as it was missing the actual XSLT file required to implement it. This version (0.3) does not use it at the moment.

ToC5.1 The BiblioX DTD

This is an XML Document Type Description for the styling or formatting specification components of BiblioX. Currently, each component of a formatted citation or reference is an element type, named using commonly-found English-language terms in the typesetting field. These names may present librarians, cataloguers, and archivists with some difficulty because they are intended for use by typesetting or publishing people. However, provided it is understood that they are merely convenient labels, and are not intended to impose a semantic, there should be no problem in getting used to them.

Figure 3: Structural representation of the BiblioX DTD

To keep this diagram to a reasonable size, some detail has been omitted:

(In fact, there is little to prevent these element types being named differently—all that would change would be the <xsl:key> elements in the keyfiles for each user-document DTD—but two element type names are privileged and should not be changed: <name>, <inline>, and <substitute>, because they are referred to explicitly in the XSLT code.)

ToC5.1.1 Identity

The current filename of this DTD in the distribution is bibliox-0.3.dtd (use this as the System Identifier in a DocType Declaration). A suitable Formal Public Identifier is +//Silmaril//DTD BiblioX v0.3 reference formatting components//EN//XML. A URI for use as the canonical System Identifier is

ToC5.1.2 The element types

Note that all formatting-level components have two sets of attributes in common: separators (mostly punctuation to precede or follow the data) and typographic details. These are discussed separately after the list of element types.


The root element type is <bibliostyle>. This has two <FIXED> attributes: <version> (currently set to 0.3) and <xmlns:bib> (which defines a namespace bib currently set to the URI of the mailing list

The <bibliostyle> element must contain at least one <style> element but may contain many.


A <style> element holds a set of specifications for formatting reference types related by their typographic style. Suitable groups would be (for example) all reference types defined by a publisher for a specific journal as its ‘house style’. This element contains one or more <inline> elements followed by one or more <reftype> elements.

It has a compulsory ID attribute <name> which names the group. Typically this could be a standard or well-known code for the owner or organiser like a publisher or journal, like ieeetr or harvard.

Two token-list attributes, <order> and <format>, define the sort order and labelling of a list of References. The order must be one of author (default), date, or as-is (document order). The format specifies the default formatting for citations and must be one of the citation styles:

number | abbrev | 
default |
footnote | endnote |
author | title | year | yearonly |
author-year | author-title | author-title-year | title-year 

See the 3rd item (‘inline’) in the list below for details of what these mean.

There is an optional <owner> CDATA attribute for a human-readable name for the owner.


<style name="ieeetr" owner="Transactions of the Institution of Electrical and Electronic Engineers" order="author" format="abbrev">...</style>


Specifies the formatting components for an inline citation. The values supported for the <style> attribute are stored externally in bibliocitestyle.ent:

number | abbrev | 
default |
footnote | endnote |
author | title | year | yearonly |
author-year | author-title | author-title-year | title-year 

Numbered citations [2] used in the Natural Sciences


The abbreviated (compressed) author-year format [Fly04] using the first three letters of the author's surname


The fully parenthetic author-year format (Flynn, 2004). The alternate format with only the year in parentheses is called author-year (see below)


Footnoted citations17 common in Law and the Humanities, often using the author-title-year format


The same as footnoted citations18 but the note is deferred to the end of the chapter


Just the surname of the author: Flynn


Just the title of the work: BiblioX


The date of the work, in parentheses: (2004)


The date of the work, without parentheses: 2004


The alternate format for a standard inline citation: Flynn (2004)


The author's name and the title of the work: Flynn, BiblioX


An extended citation with the author, title, and date: Flynn, BiblioX (2004)


A shorter form for use when the author's name has already been used earlier in the sentence: BiblioX (2004)

These names and forms are extensible and easily modified to suit specific purposes.


The container within a <style> for the formatting components of a specific reference type. The compulsory <class> ID attribute names the class of the reference type, currently a token list stored in file biblioreftypes.ent currently containing the following values:

article | book | booklet | conference | inbook | incollection |
inproceedings | manual | mastersthesis | misc | phdthesis |
proceedings | techreport | unpublished |

chapter | journal | manuscript | part | refentry | report | review |
section | set | thesis | unknown |

ballet | dance | play |

map | series |

news | software | website

This includes all the types defined in BIBTeX and DocBook, plus some others I have seen used. It is by no means exhaustive (which is probably impossible).

<option> and <group>

Currently unused. Intended respectively to enclose pairs of components of which only one is required, and pairs of components both of which are required.

<prefix> and <suffix>

Components to specify static textual material to precede or follow another component, such as the word ‘in’ before the name of a book or journal, or the abbreviation ‘pp’ before a page range. Because of the need for these strings to have their own formatting (eg italics), they are instantiated as element types, not attributes.


Holds a textual value for use when a matching real value needs to be suppressed, such as ‘et al.’ when multiple authors are omitted in inline citation. The value to be displayed is supplied in character data content.

<author> and <editor>

These element types have the same content model: optional prefix and suffix, but three compulsory <name> components: (for author read author/editor)a) one for the first or sole author; b) one for the last author of a multi-author work (which means the second author of a two-author work); and c) one in the middle for all other authors of a multi-author work (ie neither the first nor the last). This enables the correct formatting of styles which require a different surname/forename order or different typographic formatting in the first and subsequent author components.


Holds the specification of the sequence of name components within an <author> or <editor>. Currently the possible components are based on those found in the TEI DTD, which means any mix of:

surname | forename | 
genname | namelink | 
addname | rolename | 
orgname | placename

The <surname><forename> should be obvious, but the others are <namelink> (honorific or tribal prefixes like ‘von’, ‘van’, ‘de’, ‘Ó’, ‘mac’, ‘ben’, ‘bin’, etc, where it is felt necessary to identify them separately); <genname> (generational suffixes like I, II, III, etc); <rolename> (occupations or social roles, eg ‘Mother of her People’); <addname> (additional names like epithets or nicknames, eg ‘The Liberator’); <orgname> (organisation names, especially corporate names); and <placename> (geographic names).

In version 0.3 only the forename and surname have been tested.

<title> and <subtitle>

The title and optional subtitle of the work. In many styles, these are separated for presentation by a colon (this should be done by the stylesheet, not included in the text).

<journal> and <book>

Containers for what MODS calls the ‘host’, that is, the enclosing ‘In’ work within which the actual document is published.

The <journal> element has optional attributes (not implemented in version 0.3): <abbrev> (yes/no) which signals whether the journal name should be printed in its official abbreviation or not; and <authority>, which is a URI which software could query to locate the full or abbreviated form of a journal name.

No attempt has been made at this stage to restrict the content model of these two element types. In reality, books always require the publisher, for example, whereas journals almost never do—in print, although in MODS there is ample scope for storing this and other information such as frequency of appearance of a journal, which is not relevant for a book, and which never appears in a normal References in a publication unless a fully-annotated Bibliography is being produced. More discussion is needed for this.

<publisher> and <organisation>

Respectively: the publisher of a book; and the issuer, distributor, host, or sponsor of a technical report.


Place of publication. No attempt has been made to impose an address-style structure on this element type. Possibly it should be divided into city and state/country, so that typographic formatting such as small capitals can be used on the abbreviations for US and other states.

<volume> and <number>

Volume and issue number of a journal or other periodical.


Container for components referring to page references.

<start>, <finish>, and <pagecount>

No attempt has been made to impose a requirement on whether a style should require both start and finish, or start only, or just the number of pages. The logic in the BiblioX stylesheets does not currently detect which of these element types is present: perhaps it should do so and insert the relevant en-rule.


Index or serial number such as an ISSN, ISBN, Internet RFC, or ISO, IEC, or UN standards number. The compulsory <type> attribute shows which of these applies.


A URI for the referenced document. As it seems unlikely that URIs or URNs will be implemented during the lifetime of our solar system, the more commonly used URL can been used.

<date>, <day>, <month>, and <year>

Date of publication or issue. The <date> element may contain optional <day>, <month>, and <year> components (this has not been tested in version 0.3). The <day> element has a token list attribute <style> for ordinal or cardinal (defaulting to cardinal); the <month> element has a token list attribute <style> for numeric, short, or long (defaulting to short, meaning abbreviated month names); and the <year> element has a token list attribute <calendar> with values for BC, AD, CE, AUC, and AM, plus Jewish, Arabic, and Mayan (jw, ar, my). None of these has been implemented in version 0.3.

  1. Flynn, BiblioX (2004)

ToC5.1.3 Punctuation

All component-level elements can also have two attributes <beforesep> and <aftersep> which store any string required for output immediately before or after the formatted content, such as punctuation or copulae.

A selection of common punctuation character entities is included, taken from the ISO files for use when literal spaces or other characters are not desirable, or when characters cannot be generated from the terminal.

ToC5.1.4 Typographical attributes

Common formatting attributes exist for all component-level elements. Most of these should be obvious from their names, but three CDATA attributes need to be restricted in the following ways (SGML or a Schema can enforce this more easily):


for HTML output, this must be a string matching a font name in CSS recognised by browsers (eg Times, Helvetica, Courier). For typeset output, especially for portability, it should be a string matching the Regular Expressions [A-Za-z][A-Za-z0-9\.\_\-]+ (representing an Adobe /FontName) or [a-z][a-z][a-z][rbi][78][rbi][78][c]? (representing a Karl Berry [LaTeX] \fontname); but it could be the name of locally-installed TrueType or other font.


a number (only): use the <units> attribute to specify units separately.


a number, a percentage for scaling.

Using both <size> and <scale> should be taken to mean ‘use the font at this design-size but scaled to this percentage’.

The <units> attribute values (a token list) include all those specified by TeX, plus the duplicate abbreviation ‘pi’ for pica ems (pc), and the value ‘px’ for pixels. Note that the default is traditional Anglo-American printers' points (pt: 72.27 to the inch), not Adobe's ‘big points’ (bp: exactly 72 to the inch).

The remaining attributes are <fonttype> (use CSS tokens: serif, sans-serif, monospace, or decorative); <weight> (normal, bold, light, extrabold); <shape> (normal, italic, oblique, upright, smallcaps); <width> (normal, condensed, expanded, extended, compressed). Most of these values are common to any typographic system, and may be found in the documentation for CSS, LaTeX, Quark XPress, Framemaker, etc. Most of them are unimplemented in HTML/CSS browsers.

ToC5.2 The BiblioX stylesheet

The file bibliox.xsl is the stylesheet which is processed with the main document. That is, this is the name which must be given to the XSLT processor as the main stylesheet. It must be edited to show:

ToC5.3 Driver files

There must be a driver file for each main document DTD you use. Samples are provided for MODS, DocBook, and TEI in files modsdriver.xsl, docbookdriver.xsl, and teidriver.xsl.

The driver file contains templates which match the element types in the main document DTD which either a) contain citations; or b) contain the list of references. In DocBook the citation containers are <citation>, <citetitle>, and <citerefentry>, and the <biblioref> element which holds the citation IDREF itself: the reference list is contained in a <bibliography> element; in TEI the citation element type is <ref> and the reference container is <listBibl>; in MODS there is no citation element type, as MODS addresses only the storage of references: the container is <modsCollection>.

It also contains a template which matches the element type of the individual references in the container. In MODS this is <mods>; in DocBook it is <biblioentry>; in TEI it is variously <bibl>, <biblFull>, or <biblStruct>, depending on the reference type.

At the end of the file the user can add further templates to handle any subelement markup of elements used in the references which may require separate formatting. An example would be inline markup such as emphasis or foreign words, or (see the MODS driver file) the separation of a subtitle from the title by a colon and space.

ToC5.3.1 Method

These templates record in variables the style class (publisher, journal), reference type, element type name, and ID value of the current citation or reference element, for passing as parameters to the detection and formatting tem plates in the Tools file.

They then look up the relevant reference type in the formatting specification file and process the child elements (components) in document order.

For each end node (a component with no child components), BiblioX constructs a name string for use in key lookup (see §5.5, ‘Key files’ below), and tests for the presence of a key index.

If there is one, it looks up the key value[s] for the component and processes the data for output to HTML.

If there is no key index, or if there is one but it returns no values, the entire component is silently omitted.

ToC5.4 The formatting specification file

The one file supplied with version 0.3 is called generic.xml. This implements a format of citation and reference which is generally acceptable as a default.

ToC5.5 Key files

These files (one per DTD) instantiate the binding between the components in the formatting specification file and the element type names of the main document.

There must be exactly one key for every end node component in each reference type in the formatting specification file, with the exception of name components (see below). The keys are named with a multi-part string shown below. A utility stylesheet genkeyfile.xsl can be used on a new formatting specification file to generate skeleton <xsl:key> elements with the required names.

Because all references have authors or editors (creators), and because they typically share a common structure within one DTD, it is only necessary to provide one set of keys for the subcomponents of name components. The names of these keys are formed as below, but with the reference type string omitted.

The format of key construction is very precise and allows of no error: the three attributes of <xsl:key> are:


The <name> attribute is a pseudo-path representing two things:

  1. the reference type (document class) of the reference;

  2. the element path of the format components used, starting at each element child of the <reftype> in the formatting specification file (ie the ‘top-level’ components):

    author | editor |
    title |
    book | journal |
    volume | number | pages | uri |
    publisher | organisation | place | edition | date | isn

Mnemonically this can be described as:


Note carefully the underscore between the reference type and the path, and the hyphen used in the remainder of the path.

The .pos notation is reserved for <editor> and <author> element types where it is essential to be able to distinguish between single-author, dual-author, and multi-author works for typographical purposes. The value of .pos is .first, .middle, or .last.








The <match> is the XPath to the element type in the main document containing the data to be indexed, starting at the element type containing the references in the main document.


The <use> attribute identifies the ID of the reference being formatted (in the main document). If you don't use an ID here (many DTDs unwisely forgo this safety check by making the attribute optional) then it is up to some external processing to make sure the names are unique.

The sample files contain working examples of key construction.

ToC5.6 The Tools file

The file bibliotools.xsl contains the named templates which make the lookup and formatting work:

ToC5.6.1 makepath

This returns an XPath-like string representing the path to a formatting specification component which is used as the name of a key to look up the data. This is done by recursively prefixing the element type name of the component with the element type name of its parent, separated by a hyphen instead of a slash (it doesn't really matter if an element type name contains a real hyphen, since this string is used as a Qname not an element type name).

The ‘ceiling’ above which makepath will not recurse is passed as the ancestor parameter and defaults to reftype in the format template. The only other value passed for this is style, used when referencing inline styles, which the BiblioX DTD places before the <reftype>s in a formatting specification instance.

The component element type <name> is treated as special: its position is detected (first, middle, last: recall that all formatting specifications must have three name specifications for this purpose), and the suffix .first, .middle, or .last appended to the name token of the output string. This is used in key lookups to extract the relevant names of multi-authored documents, and in the format template to detect middle-grouped names for special handling (see item 4 in the list below).

ToC5.6.2 format

This takes three parameters, as outlined in §5.3, ‘Driver files’ above:


the reference type (eg book, article, report, etc);


the ID of the bibliographic reference element, used to identify the key entries.


The limit of recursion for the makepath named template.

The sequence of processing is heavily documented in the comments in the file. In outline:

  1. Output any preceding separation text if a) it is specified; b) if there is some text from the main document to output; and c) the component content is non-blank (eg a default value). This procedure is repeated at the end for any trailing separation text.

  2. The main choice is between components which have child elements (which therefore require recursing into until a terminal is reached) and those which do not (which can be handed straight to the output formatter, dealt with below in §5.6.3, ‘output’ below).

    1. For each child element, construct the key match string using the makepath named template as described in §5.6.1, ‘makepath’ above. If this is an inline citation, pass style as the ceiling value for recursion, otherwise pass reftype.

    2. Locate the data value(s) for this component by trying to dereference the key name as a test of existence, accessing the Keys file as XML. If there is a matching key, use it to look up the data and store the result in a variable context. If there si no match, ignore and pass to the next component.

    3. Count how many values were retrieved. If greater than zero (or zero but the name of the component is <prefix> or <suffix>), then convert the values to a node-set and recursively call this template again to process the children, unless…

    4. …the exception is that if the current key name ends with .middle (added by makepath) then we have a special case: the container component <name> for non-first, non-last authors or editors of a multi-authored or multi-edited document. If the child elements of this component were processed as they stand, we would get all forenames together followed by all surnames together, because that's what the keys store. Instead, we want the first forename and the first surname, then the second forename and the second surname, etc.

      1. Record the before and after separator values for later output (these would otherwise be bypassed because the element will not pass through the normal recursion of this template).

      2. For each middle author identified by this key, output the before separator, record the node and its numeric position in the sequence, and pass them to the template in the same way as normal, but retrieving the node from the key separately and extracting the nth instance, rather than letting the normal child-handling code above do it, which would have resulted in retrieving all forenames or all surnames together.

      3. This method is predicated on their being no further child components in a name subcomponent.

ToC5.6.3 output

After drilling down to the point where there are no more child components, only terminal components, the format template calls this one. The output tree is populated via a vast nested mass of code to cope with all possible commutative combinations of the typographic attributes described in §5.1.4, ‘Typographical attributes’ above.

It would have been preferable to output multiple <class> values in the manner of an IDREFS attribute, but only very recent browsers seem to handle this. At a later stage it can be rewritten that way.

ToC5.6.4 dig

This is a template to count the amount of text (both character data content and retrieved keys) in a component and its children. It is called from within a variable called volume within the format template, to see if there is any content to process.

It takes the same three parameters as format plus a recursive counter called keysfound to count the number of key values.

It processes all element children of the current node, making a pseudo-path using makepath, and uses it to test in the Keys file if there are any keys available. If so, they are counted, otherwise the count is zeroed.

Each child is then recursively processed in the same way and the number of keys added to the accumulated total, returned when all descendants have been exhausted.

At the moment, it fails to add the number of keys, outputting instead the catenation of the string values of the numbers. As the test for the presence of text in the volume variable, this doesn't matter, as it's only a test for something-or-nothing, but it's a bug that needs fixing.

ToC5.6.5 getparents

This template is called from that part of format which handles .middle names. It recursively locates all components passed in the parameter children, and finds the parents of the indexed values from the main document.

In processing .first and .last names, we take the values retrieved from indexes as they come. But for (potentially multiple) .middle names, stepping through the children and outputting all values for the first, then all values for the second, etc is precisely what we don't want to do because it results (for example) in an author called PeterPaulaMary FlynnMurphyAxford.

In processing .middle names, we need to call a template which visits all the key-indexed values and returns a node-set of their OR'd parents, then goes through these in order, and within each parent, outputs the values identified by the components, thereby separating the retrieved values into their person-by-person order.

Actually, it probably ought to return a node-set of the Lowest Common Ancestors rather than the parents, but this is rather intractable.

ToC5.6.6 docitesbefore

This template should be called immediately at the start of all processing templates for structural divisions in document type transformations where footnoted citations are used, before any other output. There is one parameter: citname, which specifies the name of the citation-bearing element type.

It outputs footnoted citations that are descendants of the parent of the current structural division but which are not descendants of any instance of the current structural division. This limits it to those citations made in the body of the parent (eg the open part of a chapter before any subsections).

Figure 4: Placement of footnoted citation template calls
  <xsl:template match="section|sect1|sect2|sect3">
    <xsl:call-template name="docitesbefore">
      <xsl:with-param name="citname">
    <xsl:call-template name="docitesafter">
      <xsl:with-param name="citeexclude" select="count(section|sect1|sect2)"/>
      <xsl:with-param name="citname">

ToC5.6.7 docitesafter

This is the logical complement of docitesbefore: it outputs footnoted citations that are descendants of the current structural element but which are not descendants of any structural subdivisions.

It must be placed last in the template[s] which handle structural divisions. There are two parameters: citname (as for §5.6.6, ‘docitesbefore’ above) and citeexclude, which specifies the name of the next lower structuraL subdivision.

Figure 5: Diagram of placement of footnoted citation template calls and their effect

ToC5.7 CSS files

Currently one CSS file is supplied, bibliofile.css which implements the bold, italics, etc classes output by the format named template.

ToC6 The BiblioX user interface

This section describes what an author or editor should expect to be able to do in DocBook and TEI documents to make citation work properly. I am assuming that the reader knows what citation and reference is about, and understands the need for accuracy and consistency in the markup of citations and the references they refer to. The section §3.2, ‘Terminology’ above explains some of this, and the examples below should be enough for the average XML user to understand what is happening. I do also assume that the reader is competent in using an XML editor with the DocBook or TEI DTDs, and in the use of XSLT with an XSLT processor to produce HTML files from their XML.

To use the default setup, the only things you need to do are:

  1. Add this line to the top of your existing XSLT stylesheet:

    <xsl:include href="bibliox.xsl"/>

  2. Edit the file bibliox.xsl to make sure

    1. that the variable keys is set to the correct Keys file name (docbookkeys.xsl, teikeys.xsl, or modskeys.xsl), according to which DTD you are using to store your References;

    2. that the same Keys filename is used in the <xsl:include> element immediate below the declaration of the keys variable;

    3. that the correct matching Driver filename is specified in the next <xsl:include> element.

  3. Make sure your citations are in the format (for DocBook):

    <citation><biblioref linkend="abc123"/></citation>

    or (for TEI):

    <cit><ref target="abc123"></ref></cit>

    where the value abc123 is replaced by the ID value of the relevant reference entry.

  4. Make sure your references are accurately marked up in the appropriate place in the document (usually at the end). The IDs used in the citations must all exist among your references, otherwise XML will emit error messages. It is not, however, an XML error to have unused references (ones which are never cited).

    Here is an example of a reference in DocBook format:

      <biblioentry id="abc123" type="book">
        <title>Understanding SGML and XML Tools</title>
        <titleabbrev>SGML &amp; XML Tools</titleabbrev>
        <date YYYY-MM-DD="1998">1998</date>

    All <biblioentry> elements must be contained within a <bibliography> element in DocBook.

    Here is the same reference done in TEI format:

      <biblFull id="abc123" rend="book">
          <title>Understanding SGML and XML Tools</title>
          <publisher>Kluwer Academic Publishers</publisher>
          <idno type="isbn">0-7923-8169-6</idno>
          <date value="1998">1998</date>

    In TEI, all <biblFull>, <biblStruct>, or <bibl> elements must be within a <listBibl> element.

  5. Edit the file generic.xml and change the value of the format attributre on the <style> element to whatever you want the default to be—one of:

    number | abbrev | 
    default |
    footnote | endnote |
    author | title | year | yearonly |
    author-year | author-title | author-title-year | title-year 

    Normally it will be number

ToC6.1 The formatting specification file

There is only one example of this file supplied with BiblioX, generic.xml. This implements a very basic and unexciting but usable set of formats for common everyday citations, such as you will find in the work of most academics, researchers, scientists, and documenters. It does not cater for the special needs of the professional bibliographer or librarian—although it could do so very easily: I just haven't written that much of it yet—nor does it provide some of the more outré formats required by smaller fields of research.

The structure of this file is shown in Figure 3, ‘Structural representation of the BiblioX DTD’ above. If you want to copy it to another filename and then take that version to pieces and reconstruct it to implement a publisher's or journal's house style, feel free to do so, just don't destroy or overwrite the actual file generic.xml. If you do create another version, you will need to change the value of the variable bibstyles in the bibliox.xsl file.

ToC6.2 Use with DocBook

The recent (March 2004) changes to DocBook to allow the <biblioref> element in <citation>, <citerefentry>, and <citetitle> (and elsewhere) mean that the full scope of citation and reference is now possible within DocBook documents for the first time. This method has been used in this version in preference to the previous method, which was to add an IDREF attribute to the three citation element types, as well as other metadata.

The documentation for DocBook(Walsh, 1999) says that these three element types have the following applications:


An inline bibliographic reference to another published work


A citation to a reference page


The title of a cited work

These reflect the very limited range of citations made in computer documentation, which is restricted very much to the occasional citation of external standards or a few articles or books; a lot of man pages; and the less formal passing mentions of titles of books. However, people increasingly want to use the rich markup available elsewhere in DocBook for

ToC6.3 Use with TEI

ToC6.4 Samples

The three XML sample files included are for MODS, DocBook, and TEI. They have been processed with Saxon 6.5.2 and the BiblioX XSL to produce HTML.

They are not intended to be exhaustive, merely to demonstrate that it is possible to do this kind of formatting successfully using XSLT.

ToC7 Further work

A huge amount remains to be done:


Many more examples are needed, preferably using real-life documents with their owners' permission.


We need some usability tests. MODS in particular concerns me because its terminology, while wholly accurate, is alien to non-librarians and non-archivists. By the same token I am aware that the terms I have used will be regarded with the same suspicion by librarians and archivists for being too concrete and printer/publisher oriented.


There is vast scope for the code to be tightened and improved. The objective of this pass was to get a working system, and that meant finding problems along the way and dealing with them as best could be done at the time.

There are also undoubtedly things I have done which will make XSLT experts shudder in horror, as there must be easier and cleaner ways of performing some of the actions required within the side-effect-free environment of XSLT.


  1. Flynn, BiblioX (2004)


Berger, Jens: ‘Extended BibTeX citation support for the humanities and legal texts’. [published by the author], Potsdam, Germany, 2002, [].

Flynn, Peter: ‘BiblioX’. Silmaril Consultants, Cork, Ireland, 2004.

Hägglund, Sture and Roland Tibell: ‘Multi-style dialogues and control independence in interactive software’. In TRG Green, SJ Payne and GC van der Veer, The Psychology of Computer Use, Academic Press, London, 1983, pp 171–189, ISBN 0122974204.

Library of Congress' Network Development and MARC Standards Office: ‘Metadata Object Description Schema’. v.3, Library of Congress, Washington, DC, December 2003, [].

Patashnik, Oren: ‘BIBTeX’. TeX Users Group, Portland, OR, 1988, [].

Walsh, Norman and Leonard Muellner: DocBook: The Definitive Guide. O'Reilly & Associates, Sebastopol, CA, 1999, 156592-580-7.

ToCGNU Free Documentation License

Version 1.2, November 2002

Copyright (C) 2000,2001,2002 Free Software Foundation, Inc. 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.


The purpose of this License is to make a manual, textbook, or other functional and useful document "free" in the sense of freedom: to assure everyone the effective freedom to copy and redistribute it, with or without modifying it, either commercially or noncommercially. Secondarily, this License preserves for the author and publisher a way to get credit for their work, while not being considered responsible for modifications made by others.

This License is a kind of "copyleft", which means that derivative works of the document must themselves be free in the same sense. It complements the GNU General Public License, which is a copyleft license designed for free software.

We have designed this License in order to use it for manuals for free software, because free software needs free documentation: a free program should come with manuals providing the same freedoms that the software does. But this License is not limited to software manuals; it can be used for any textual work, regardless of subject matter or whether it is published as a printed book. We recommend this License principally for works whose purpose is instruction or reference.


This License applies to any manual or other work, in any medium, that contains a notice placed by the copyright holder saying it can be distributed under the terms of this License. Such a notice grants a world-wide, royalty-free license, unlimited in duration, to use that work under the conditions stated herein. The "Document", below, refers to any such manual or work. Any member of the public is a licensee, and is addressed as "you". You accept the license if you copy, modify or distribute the work in a way requiring permission under copyright law.

A "Modified Version" of the Document means any work containing the Document or a portion of it, either copied verbatim, or with modifications and/or translated into another language.

A "Secondary Section" is a named appendix or a front-matter section of the Document that deals exclusively with the relationship of the publishers or authors of the Document to the Document's overall subject (or to related matters) and contains nothing that could fall directly within that overall subject. (Thus, if the Document is in part a textbook of mathematics, a Secondary Section may not explain any mathematics.) The relationship could be a matter of historical connection with the subject or with related matters, or of legal, commercial, philosophical, ethical or political position regarding them.

The "Invariant Sections" are certain Secondary Sections whose titles are designated, as being those of Invariant Sections, in the notice that says that the Document is released under this License. If a section does not fit the above definition of Secondary then it is not allowed to be designated as Invariant. The Document may contain zero Invariant Sections. If the Document does not identify any Invariant Sections then there are none.

The "Cover Texts" are certain short passages of text that are listed, as Front-Cover Texts or Back-Cover Texts, in the notice that says that the Document is released under this License. A Front-Cover Text may be at most 5 words, and a Back-Cover Text may be at most 25 words.

A "Transparent" copy of the Document means a machine-readable copy, represented in a format whose specification is available to the general public, that is suitable for revising the document straightforwardly with generic text editors or (for images composed of pixels) generic paint programs or (for drawings) some widely available drawing editor, and that is suitable for input to text formatters or for automatic translation to a variety of formats suitable for input to text formatters. A copy made in an otherwise Transparent file format whose markup, or absence of markup, has been arranged to thwart or discourage subsequent modification by readers is not Transparent. An image format is not Transparent if used for any substantial amount of text. A copy that is not "Transparent" is called "Opaque".

Examples of suitable formats for Transparent copies include plain ASCII without markup, Texinfo input format, LaTeX input format, SGML or XML using a publicly available DTD, and standard-conforming simple HTML, PostScript™ or PDF designed for human modification. Examples of transparent image formats include PNG, XCF and JPG. Opaque formats include proprietary formats that can be read and edited only by proprietary word processors, SGML or XML for which the DTD and/or processing tools are not generally available, and the machine-generated HTML, PostScript™ or PDF produced by some word processors for output purposes only.

The "Title Page" means, for a printed book, the title page itself, plus such following pages as are needed to hold, legibly, the material this License requires to appear in the title page. For works in formats which do not have any title page as such, "Title Page" means the text near the most prominent appearance of the work's title, preceding the beginning of the body of the text.

A section "Entitled XYZ" means a named subunit of the Document whose title either is precisely XYZ or contains XYZ in parentheses following text that translates XYZ in another language. (Here XYZ stands for a specific section name mentioned below, such as "Acknowledgements", "Dedications", "Endorsements", or "History".) To "Preserve the Title" of such a section when you modify the Document means that it remains a section "Entitled XYZ" according to this definition.

The Document may include Warranty Disclaimers next to the notice which states that this License applies to the Document. These Warranty Disclaimers are considered to be included by reference in this License, but only as regards disclaiming warranties: any other implication that these Warranty Disclaimers may have is void and has no effect on the meaning of this License.


You may copy and distribute the Document in any medium, either commercially or noncommercially, provided that this License, the copyright notices, and the license notice saying this License applies to the Document are reproduced in all copies, and that you add no other conditions whatsoever to those of this License. You may not use technical measures to obstruct or control the reading or further copying of the copies you make or distribute. However, you may accept compensation in exchange for copies. If you distribute a large enough number of copies you must also follow the conditions in section 3.

You may also lend copies, under the same conditions stated above, and you may publicly display copies.


If you publish printed copies (or copies in media that commonly have printed covers) of the Document, numbering more than 100, and the Document's license notice requires Cover Texts, you must enclose the copies in covers that carry, clearly and legibly, all these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on the back cover. Both covers must also clearly and legibly identify you as the publisher of these copies. The front cover must present the full title with all words of the title equally prominent and visible. You may add other material on the covers in addition. Copying with changes limited to the covers, as long as they preserve the title of the Document and satisfy these conditions, can be treated as verbatim copying in other respects.

If the required texts for either cover are too voluminous to fit legibly, you should put the first ones listed (as many as fit reasonably) on the actual cover, and continue the rest onto adjacent pages.

If you publish or distribute Opaque copies of the Document numbering more than 100, you must either include a machine-readable Transparent copy along with each Opaque copy, or state in or with each Opaque copy a computer-network location from which the general network-using public has access to download using public-standard network protocols a complete Transparent copy of the Document, free of added material. If you use the latter option, you must take reasonably prudent steps, when you begin distribution of Opaque copies in quantity, to ensure that this Transparent copy will remain thus accessible at the stated location until at least one year after the last time you distribute an Opaque copy (directly or through your agents or retailers) of that edition to the public.

It is requested, but not required, that you contact the authors of the Document well before redistributing any large number of copies, to give them a chance to provide you with an updated version of the Document.


You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above, provided that you release the Modified Version under precisely this License, with the Modified Version filling the role of the Document, thus licensing distribution and modification of the Modified Version to whoever possesses a copy of it. In addition, you must do these things in the Modified Version:

  1. Use in the Title Page (and on the covers, if any) a title distinct from that of the Document, and from those of previous versions (which should, if there were any, be listed in the History section of the Document). You may use the same title as a previous version if the original publisher of that version gives permission.
  2. List on the Title Page, as authors, one or more persons or entities responsible for authorship of the modifications in the Modified Version, together with at least five of the principal authors of the Document (all of its principal authors, if it has fewer than five), unless they release you from this requirement.
  3. State on the Title page the name of the publisher of the Modified Version, as the publisher.
  4. Preserve all the copyright notices of the Document.
  5. Add an appropriate copyright notice for your modifications adjacent to the other copyright notices.
  6. Include, immediately after the copyright notices, a license notice giving the public permission to use the Modified Version under the terms of this License, in the form shown in the Addendum in the list below.
  7. Preserve in that license notice the full lists of Invariant Sections and required Cover Texts given in the Document's license notice.
  8. Include an unaltered copy of this License.
  9. Preserve the section Entitled "History", Preserve its Title, and add to it an item stating at least the title, year, new authors, and publisher of the Modified Version as given on the Title Page. If there is no section Entitled "History" in the Document, create one stating the title, year, authors, and publisher of the Document as given on its Title Page, then add an item describing the Modified Version as stated in the previous sentence.
  10. Preserve the network location, if any, given in the Document for public access to a Transparent copy of the Document, and likewise the network locations given in the Document for previous versions it was based on. These may be placed in the "History" section. You may omit a network location for a work that was published at least four years before the Document itself, or if the original publisher of the version it refers to gives permission.
  11. For any section Entitled "Acknowledgements" or "Dedications", Preserve the Title of the section, and preserve in the section all the substance and tone of each of the contributor acknowledgements and/or dedications given therein.
  12. Preserve all the Invariant Sections of the Document, unaltered in their text and in their titles. Section numbers or the equivalent are not considered part of the section titles.
  13. Delete any section Entitled "Endorsements". Such a section may not be included in the Modified Version.
  14. Do not retitle any existing section to be Entitled "Endorsements" or to conflict in title with any Invariant Section.
  15. Preserve any Warranty Disclaimers.

If the Modified Version includes new front-matter sections or appendices that qualify as Secondary Sections and contain no material copied from the Document, you may at your option designate some or all of these sections as invariant. To do this, add their titles to the list of Invariant Sections in the Modified Version's license notice. These titles must be distinct from any other section titles.

You may add a section Entitled "Endorsements", provided it contains nothing but endorsements of your Modified Version by various parties--for example, statements of peer review or that the text has been approved by an organization as the authoritative definition of a standard.

You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25 words as a Back-Cover Text, to the end of the list of Cover Texts in the Modified Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or through arrangements made by) any one entity. If the Document already includes a cover text for the same cover, previously added by you or by arrangement made by the same entity you are acting on behalf of, you may not add another; but you may replace the old one, on explicit permission from the previous publisher that added the old one.

The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity for or to assert or imply endorsement of any Modified Version.


You may combine the Document with other documents released under this License, under the terms defined in section 4 in the list above for modified versions, provided that you include in the combination all of the Invariant Sections of all of the original documents, unmodified, and list them all as Invariant Sections of your combined work in its license notice, and that you preserve all their Warranty Disclaimers.

The combined work need only contain one copy of this License, and multiple identical Invariant Sections may be replaced with a single copy. If there are multiple Invariant Sections with the same name but different contents, make the title of each such section unique by adding at the end of it, in parentheses, the name of the original author or publisher of that section if known, or else a unique number. Make the same adjustment to the section titles in the list of Invariant Sections in the license notice of the combined work.

In the combination, you must combine any sections Entitled "History" in the various original documents, forming one section Entitled "History"; likewise combine any sections Entitled "Acknowledgements", and any sections Entitled "Dedications". You must delete all sections Entitled "Endorsements".


You may make a collection consisting of the Document and other documents released under this License, and replace the individual copies of this License in the various documents with a single copy that is included in the collection, provided that you follow the rules of this License for verbatim copying of each of the documents in all other respects.

You may extract a single document from such a collection, and distribute it individually under this License, provided you insert a copy of this License into the extracted document, and follow this License in all other respects regarding verbatim copying of that document.


A compilation of the Document or its derivatives with other separate and independent documents or works, in or on a volume of a storage or distribution medium, is called an "aggregate" if the copyright resulting from the compilation is not used to limit the legal rights of the compilation's users beyond what the individual works permit. When the Document is included an aggregate, this License does not apply to the other works in the aggregate which are not themselves derivative works of the Document.

If the Cover Text requirement of section 3 is applicable to these copies of the Document, then if the Document is less than one half of the entire aggregate, the Document's Cover Texts may be placed on covers that bracket the Document within the aggregate, or the electronic equivalent of covers if the Document is in electronic form. Otherwise they must appear on printed covers that bracket the whole aggregate.


Translation is considered a kind of modification, so you may distribute translations of the Document under the terms of section 4. Replacing Invariant Sections with translations requires special permission from their copyright holders, but you may include translations of some or all Invariant Sections in addition to the original versions of these Invariant Sections. You may include a translation of this License, and all the license notices in the Document, and any Warranty Disclaimers, provided that you also include the original English version of this License and the original versions of those notices and disclaimers. In case of a disagreement between the translation and the original version of this License or a notice or disclaimer, the original version will prevail.

If a section in the Document is Entitled "Acknowledgements", "Dedications", or "History", the requirement (section 4) to Preserve its Title (section 1) will typically require changing the actual title.


You may not copy, modify, sublicense, or distribute the Document except as expressly provided for under this License. Any other attempt to copy, modify, sublicense or distribute the Document is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance.


The Free Software Foundation may publish new, revised versions of the GNU Free Documentation License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. See

Each version of the License is given a distinguishing version number. If the Document specifies that a particular numbered version of this License "or any later version" applies to it, you have the option of following the terms and conditions either of that specified version or of any later version that has been published (not as a draft) by the Free Software Foundation. If the Document does not specify a version number of this License, you may choose any version ever published (not as a draft) by the Free Software Foundation.


To use this License in a document you have written, include a copy of the License in the document and put the following copyright and license notices just after the title page:

Copyright (c) YEAR YOUR NAME. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".

If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts, replace the "with...Texts." line with this:

with the Invariant Sections being LIST THEIR TITLES, with the Front-Cover Texts being LIST, and with the Back-Cover Texts being LIST.

If you have Invariant Sections without Cover Texts, or some other combination of the three, merge those two alternatives to suit the situation.

If your document contains nontrivial examples of program code, we recommend releasing these examples in parallel under your choice of free software license, such as the GNU General Public License, to permit their use in free software.