An experiment in creating a portable, extensible system for formatting bibliographic citations and references
BiblioX is an attempt to create an XML-based system for formatting bibliographic citations and references using XSLT. The recommended Document Type for storing bibliographic information is MODS, the Metadata Object Description Schema.1 Development is centered around this format, although examples in other formats are included for comparison. The Document Type used for storing this manual is DocBook.
BiblioX is an experimental or proof-of-concept system. It is not finished software, as it has bugs and missing bits, and should not be used for production processing.
BiblioX is a set of XSLT templates and a file format for storing formatting information for bibliographic citations and references. It is similar in effect—although different in operation—to the BIBTeX system for use with LaTeX.
The advantage of BiblioX over other XML-based systems is that it works your documents in any DTD or Schema: you do not have to hard-code the element type names of your document markup vocabulary into the XSLT used by BiblioX for formatting. If you have existing bibliographic material in different formats (for example, DocBook or TEI), it can be processed just as easily as in MODS.
BiblioX uses a simple, visually-oriented formatting specification file with its own XML vocabulary. In this file you can set out exactly which components of a citation or a reference are printed or displayed, along with details of formatting (fonts, styles, sizes, etc), without any reference to any particular source document DTD or Schema (see Figure 1, ‘BiblioX formatting specification file’ below). All you need to provide is an indexing file which contains the equivalences or between the components in the formatting specification file and the specific DTD or Schema you are using, in the form of XSL keys.a
<inline style="author-year"> <author aftersep=" "> <name><surname></surname></name> <name></name> <name><substitute shape="italic" beforesep=" ">et al.</substitute></name> </author> <date beforesep=" (" aftersep=")"><year></year></date> </inline> <reftype class="book" aftersep="."> <author aftersep=": "> <name> <surname weight="bold"></surname> <forename beforesep=", "></forename> </name> <name beforesep="; " aftersep=";"> <forename></forename> <surname beforesep=" "></surname> </name> <name beforesep=" and "> <forename></forename> <surname beforesep=" "></surname> </name> </author> <title shape="italic" aftersep="."></title> <publisher beforesep=" "></publisher> <place beforesep=", "></place> <date beforesep=", "></date> <isn type="ISBN" beforesep=", "></isn> <uri fonttype="monospace" beforesep=" [" aftersep="]"></uri> </reftype>
Two examples from the generic.xml example formatting specification file, one showing a common author-year inline citation format, eg Walsh (1999), and the other showing a reference format for books , see ‘References’ below for details.
When BiblioX processes your main document, it looks up the formatting details for each citation or reference in the formatting specification file. It uses the components it finds there, to output the required formatting, using the index keys to look up back in the main document the textual values needed for each component (see Figure 2, ‘Schematic of BiblioX processing’ below).
The BiblioX main stylesheet, a ‘driver’ file for your DTD, and a tools file of common routines contain all the XSLT code needed to do this. Output at the moment is restricted to HTML, but a LaTeX version is in preparation.
Normally, only one type of citation format is used in any given document, determined by the house style of the publisher or journal. The exception is in some documents in the Humanities, where the two alternatives, Walsh (1999) and (Walsh, 1999), are used according to the grammar of the surrounding text. In fact, BiblioX currently supports the unrestricted intermixing of different formats in any document.
Apart from the BiblioX files, the only specific piece of software needed is an XSLT processor. These are freely available for download from many sources. The development environment uses Saxon from http://saxon.sourceforge.net.b It is assumed that the user will already have an XML editor with which to write the main document.
Accurate data is critical to the working of any XML-based transformation. If the citation or reference information in your source document is inaccurate or poorly marked, you will almost certainly get poor results. In particular the essential components of a reference (author, title, date, and source) must be adequately marked up in a machine-interpretable form:
The author element must have subelements for at least the surname, for sorting purposes, and preferably the other parts of the name (forename, etc);
The title should be stored separately from the subtitle where the markup allows this. If not, a colon (:) and white-space should separate the title from the subtitle.
The date must have a machine-sortable identifier if the textual form is not sortable. It is strongly recommended that an attribute is used to hold the ISO 8601 date (eg 2004-04-05)c because culturally-based date expressions are not usually amenable to machine sorting.
The source (either the publisher or the title of the journal or other monograph in which the work appears) must be identifiable in markup.
It is also recommended that you link your citations to the references via XML's ID/IDREF feature. This mechanism is a built into XML explicitly for cross-references such as bibliographic citations, and it is checked automatically by all validating parsers and editors to make sure you don't accidentally refer to something that doesn't exist.d Unfortunately some DTDs fail to provide ID or IDREF (or either) on the critical element types used for citation and reference. In this case, CDATA values can be used but it becomes the authors' manual responsibility to ensure that all citations bear a suitable value which is accessible in the references.
<date>element type for this purpose.
The current release of this software is 0.3, which means it is experimental and incomplete. In particular the documentation is still being written. BiblioX was formerly known as Bibliofile: I changed the name to avoid conflicting with an entirely separate and unconnected project already called Bibliofile.
It has been tested with Saxon and Cocoon.
Sample documents are provided in TEI, DocBook, and MODS format.
Output is currently HTML only.
This is a co-operative effort. The driving force behind the project has been Bruce D'Arcus, and a lot of the content relies on the work of Markus Hoenicka, Taco Hoekwater, Torsten Bronger, Karen Coyle, David Wilson, Eve Maler, Norm Walsh, and others whose document classes, DTDs, Schemas, style files, processing formats, and contributions to the discussions have improved my understanding of the problem.
There have been several ongoing and previous projects in the bibliographic storage and formatting field whose work has been invaluable in identifying problems, proposing solutions, and testing theories:
I must acknowledge the help of members of the XSL mailing
list who kindly investigated some of my broken code,
especially Jeni Tennison and Bob DuCharme. Thanks also go to
David Carlisle and Markus Abt for identifying my ignorance of
the effects of
<xsl:for-each> and suggesting
the use of
Ongoing discussions on the project take place on the BiblioX mailing list, which you can join at http://lists.ucc.ie/lists/archives/bibliofile.hmtl.
Some terms used here may be unfamiliar to some readers.
A mention of a document or other object (including a direct quotation from a document) in support of an argument, made during discourse. The following are approximate guidelines: there are many variants.
A citation is typically shown either as a superscripted number immediately after the mention or quotation,6 referring the user to a footnote which gives the author and date, possibly with a short title and page or section number; or as an inline citation (Flynn, 2004). The full list of References is printed in alphabetical or chronological order at the end of the document, without any numbering or labelling.
A citation is typically shown either as a number  or mnemonic abbreviated label in [square brackets] immediately after the mention or quotation [Fly04] (sometimes superscripted); extended inline citations as in the Humanities are also sometimes used. The number or the abbreviated label in square brackets refers the user directly to the numbered or labelled list of References at the end of the document, which may be in citation order or alphabetic order.
In Law and some other disciplines there are some highly specific and very rigorous rules for citation. These are not addressed directly by BiblioX, and may require the addition of extra features in a later version. The reader is referred to the documentation in Berger (2002) and the associated mailing list for an extended discussion.
A sequence of details or metadata about a document or other object referred to in a Citation, sufficient for the reader to identify it uniquely, look it up in a library or database, or otherwise locate it. Typical details include:
The title or name of the work;
The names of the authors, creators, or editors;
The date of publication or creation;
The agency by which it appeared or the source from which it is available (publisher, institution, sponsor, patron, etc);
The location or address of the agency or source;
Details of scope, size, quantity, shape, range, or other physical attributes;
Similar details of a greater enclosing work (book, journal, collection), if the document appeared as part of something else.
This last item identifies the major distinction between monographs (free-standing works like books, whole journals, reports, Proceedings, etc) and individual contributions to them (articles, chapters, papers, etc). The latter are always referenced with the details of the greater enclosing work as well as the details of the individual contribution. (Hägglund, 1983)
A class of References sharing the same general features and requiring the same typographic formatting.
The conventional monographic reference types include books, technical reports, journals, theses, plays, musical compositions, etc. Individual contribution types include articles, papers at a conference, chapters in a book, sections in a report, maps in a series, etc.
Two well-known sets of reference types are those used by the BIBTeX system and those used by the DocBook DTD.
A Bibliox formatting specification for the lowest level of detail in a Reference.
This is roughly equivalent to a field in a database or an element in a DTD. A component names the type of data generically (eg ‘title’, ‘editor’), and provides typographic information about how it is to appear. A component can have subcomponents (eg the ‘name’ component is made up of ‘forename’ and ‘surname’ subcomponents). In a BiblioX formatting specification file, components are represented as element types from the BiblioX DTD. A component with no subcomponents is assumed to have a key entry in the keys file which identifies where to get the data from.
The user's XML document containing the text with citations and the list of bibliographic references which need formatting.
An XML document containing an instance of the BiblioX DTD which gives the formatting components for at least one group of reference types.
[Bruce to describe.]
The BiblioX software is implemented entirely in XSLT. No additional software is needed apart from an XSLT processor. The one which has been used in testing is Saxon.
There is one possible dependency: the formatter may use the
<exsl:node-set> function, which is included
with Saxon but may not be in other processors. It can apparently
be downloaded from exslt.org but the
copy acquired during testing was found to be defective as it was
missing the actual XSLT file required to implement it. This
version (0.3) does not use it at the moment.
This is an XML Document Type Description for the styling or formatting specification components of BiblioX. Currently, each component of a formatted citation or reference is an element type, named using commonly-found English-language terms in the typesetting field. These names may present librarians, cataloguers, and archivists with some difficulty because they are intended for use by typesetting or publishing people. However, provided it is understood that they are merely convenient labels, and are not intended to impose a semantic, there should be no problem in getting used to them.
To keep this diagram to a reasonable size, some detail has been omitted:
In this version, the
<group>elements are not implemented.
Element symbols ending with a white rectangle at the right-hand end are exposed elsewhere in the diagram.
Element symbols ending with a tilde have only textual content (plus
<suffix>) and are not exposed.
The exceptions are
<journal>, which share a similar content model (
(In fact, there is little to prevent these element types
being named differently—all that would change would be
<xsl:key> elements in the keyfiles for
each user-document DTD—but two element type names are
privileged and should not be changed:
because they are referred to explicitly in the XSLT
The current filename of this DTD in the distribution is bibliox-0.3.dtd (use this as the System Identifier in a DocType Declaration). A suitable Formal Public Identifier is +//Silmaril//DTD BiblioX v0.3 reference formatting components//EN//XML. A URI for use as the canonical System Identifier is http://www.silmaril.ie/bibliox/bibliox-0.3.dtd.
Note that all formatting-level components have two sets of attributes in common: separators (mostly punctuation to precede or follow the data) and typographic details. These are discussed separately after the list of element types.
The root element type is
<bibliostyle>. This has two
<version> (currently set to
<xmlns:bib> (which defines a
bib currently set to the URI of
the mailing list
<bibliostyle> element must
contain at least one
but may contain many.
<style> element holds a set of
specifications for formatting reference types related
by their typographic style. Suitable groups would be
(for example) all reference types defined by a
publisher for a specific journal as its
‘house style’. This element
contains one or more
elements followed by one or more
It has a compulsory ID attribute
<name> which names the group.
Typically this could be a standard or well-known code
for the owner or organiser like a publisher or
Two token-list attributes,
<format>, define the sort order and
labelling of a list of References. The order must be
as-is (document order). The
format specifies the default formatting for citations
and must be one of the citation styles:
number | abbrev | default | footnote | endnote | author | title | year | yearonly | author-year | author-title | author-title-year | title-year
See the 3rd item (‘inline’) in the list below for details of what these mean.
There is an optional
CDATA attribute for a human-readable name for the
<style name="ieeetr" owner="Transactions of the Institution of Electrical and Electronic Engineers" order="author" format="abbrev">...</style>
Specifies the formatting components for an inline
citation. The values supported for the
<style> attribute are stored
number | abbrev | default | footnote | endnote | author | title | year | yearonly | author-year | author-title | author-title-year | title-year
Numbered citations  used in the Natural Sciences
The abbreviated (compressed) author-year format [Fly04] using the first three letters of the author's surname
The fully parenthetic author-year format (Flynn, 2004). The alternate format with only the year in parentheses is called author-year (see below)
Footnoted citations17 common in Law and the Humanities, often using the author-title-year format
The same as footnoted citations18 but the note is deferred to the end of the chapter
Just the surname of the author: Flynn
Just the title of the work: BiblioX
The date of the work, in parentheses: (2004)
The date of the work, without parentheses: 2004
The alternate format for a standard inline citation: Flynn (2004)
The author's name and the title of the work: Flynn, BiblioX
An extended citation with the author, title, and date: Flynn, BiblioX (2004)
A shorter form for use when the author's name has already been used earlier in the sentence: BiblioX (2004)
These names and forms are extensible and easily modified to suit specific purposes.
The container within a
for the formatting components of a specific reference
type. The compulsory
attribute names the class of the reference type,
currently a token list stored in file
containing the following values:
article | book | booklet | conference | inbook | incollection | inproceedings | manual | mastersthesis | misc | phdthesis | proceedings | techreport | unpublished | chapter | journal | manuscript | part | refentry | report | review | section | set | thesis | unknown | ballet | dance | play | map | series | news | software | website
This includes all the types defined in BIBTeX and DocBook, plus some others I have seen used. It is by no means exhaustive (which is probably impossible).
Currently unused. Intended respectively to enclose pairs of components of which only one is required, and pairs of components both of which are required.
Components to specify static textual material to precede or follow another component, such as the word ‘in’ before the name of a book or journal, or the abbreviation ‘pp’ before a page range. Because of the need for these strings to have their own formatting (eg italics), they are instantiated as element types, not attributes.
Holds a textual value for use when a matching real value needs to be suppressed, such as ‘et al.’ when multiple authors are omitted in inline citation. The value to be displayed is supplied in character data content.
These element types have the same content model:
optional prefix and suffix, but
<name> components: (for author read
author/editor)a) one for the first or sole author; b) one for the last author of a multi-author
work (which means the second author of a
two-author work); and c) one in the middle for all other authors of a
multi-author work (ie neither the first nor the
last). This enables the correct formatting of
styles which require a different surname/forename
order or different typographic formatting in the first
and subsequent author components.
Holds the specification of the sequence of name
components within an
<editor>. Currently the possible
components are based on those found in the TEI DTD,
which means any mix of:
surname | forename | genname | namelink | addname | rolename | orgname | placename
<forename> should be obvious, but the
<namelink> (honorific or
tribal prefixes like ‘von’,
‘bin’, etc, where it is
felt necessary to identify them
suffixes like I, II, III, etc);
<rolename> (occupations or
social roles, eg ‘Mother of her
<addname> (additional names
like epithets or nicknames, eg ‘The
names, especially corporate names); and
In version 0.3 only the forename and surname have been tested.
The title and optional subtitle of the work. In many styles, these are separated for presentation by a colon (this should be done by the stylesheet, not included in the text).
Containers for what MODS calls the ‘host’, that is, the enclosing ‘In’ work within which the actual document is published.
<journal> element has
optional attributes (not implemented in version
<abbrev> (yes/no) which
signals whether the journal name should be printed in
its official abbreviation or not; and
<authority>, which is a URI which
software could query to locate the full or abbreviated
form of a journal name.
No attempt has been made at this stage to restrict the content model of these two element types. In reality, books always require the publisher, for example, whereas journals almost never do—in print, although in MODS there is ample scope for storing this and other information such as frequency of appearance of a journal, which is not relevant for a book, and which never appears in a normal References in a publication unless a fully-annotated Bibliography is being produced. More discussion is needed for this.
Respectively: the publisher of a book; and the issuer, distributor, host, or sponsor of a technical report.
Place of publication. No attempt has been made to impose an address-style structure on this element type. Possibly it should be divided into city and state/country, so that typographic formatting such as small capitals can be used on the abbreviations for US and other states.
Volume and issue number of a journal or other periodical.
Container for components referring to page references.
No attempt has been made to impose a requirement on whether a style should require both start and finish, or start only, or just the number of pages. The logic in the BiblioX stylesheets does not currently detect which of these element types is present: perhaps it should do so and insert the relevant en-rule.
Index or serial number such as an ISSN, ISBN,
Internet RFC, or ISO, IEC, or UN standards number. The
<type> attribute shows
which of these applies.
A URI for the referenced document. As it seems unlikely that URIs or URNs will be implemented during the lifetime of our solar system, the more commonly used URL can been used.
Date of publication or issue. The
<date> element may contain optional
<year> components (this has not been
tested in version 0.3). The
element has a token list attribute
<style> for ordinal or cardinal
(defaulting to cardinal); the
element has a token list attribute
<style> for numeric, short, or long
(defaulting to short, meaning abbreviated month
names); and the
<year> element has a
token list attribute
values for BC, AD, CE, AUC, and AM, plus Jewish,
Arabic, and Mayan (jw, ar, my). None of these has been
implemented in version 0.3.
All component-level elements can also have two
<aftersep> which store any string required
for output immediately before or after the formatted
content, such as punctuation or copulae.
A selection of common punctuation character entities is included, taken from the ISO files for use when literal spaces or other characters are not desirable, or when characters cannot be generated from the terminal.
Common formatting attributes exist for all component-level elements. Most of these should be obvious from their names, but three CDATA attributes need to be restricted in the following ways (SGML or a Schema can enforce this more easily):
for HTML output, this must be a string matching a
font name in CSS recognised by browsers (eg Times,
Helvetica, Courier). For typeset output, especially
for portability, it should be a string matching the
(representing an Adobe /FontName)
(representing a Karl Berry [LaTeX]
\fontname); but it could be the
name of locally-installed TrueType or other
a number (only): use the
attribute to specify units separately.
a number, a percentage for scaling.
<scale> should be taken to mean ‘use
the font at this design-size but scaled to this
<units> attribute values (a token
list) include all those specified by TeX, plus the
duplicate abbreviation ‘pi’ for pica
ems (pc), and the value ‘px’ for
pixels. Note that the default is traditional Anglo-American
printers' points (pt: 72.27 to the inch), not Adobe's
‘big points’ (bp: exactly 72 to the
The remaining attributes are
(use CSS tokens: serif, sans-serif, monospace, or
<weight> (normal, bold, light,
<shape> (normal, italic,
oblique, upright, smallcaps);
(normal, condensed, expanded, extended, compressed). Most of
these values are common to any typographic system, and may
be found in the documentation for CSS, LaTeX, Quark
XPress, Framemaker, etc. Most of them are unimplemented in
The file bibliox.xsl is the stylesheet which is processed with the main document. That is, this is the name which must be given to the XSLT processor as the main stylesheet. It must be edited to show:
the name of the formatting specification file to be used. This goes in the variable bibstyles;
the names of the driver and keys files as
<xsl:include> elements where
The name of the keys file must also be given as the value of the variable keys.
There must be a driver file for each main document DTD you use. Samples are provided for MODS, DocBook, and TEI in files modsdriver.xsl, docbookdriver.xsl, and teidriver.xsl.
The driver file contains templates which match the element
types in the main document DTD which either
a) contain citations; or b) contain the list of references. In DocBook the citation containers are
<citerefentry>, and the
<biblioref> element which holds the citation
IDREF itself: the reference list is contained in a
<bibliography> element; in TEI the citation
element type is
<ref> and the reference
<listBibl>; in MODS there is no
citation element type, as MODS addresses only the storage of
references: the container is
It also contains a template which matches the element type
of the individual references in the container. In MODS this is
<mods>; in DocBook it is
<biblioentry>; in TEI it is variously
<biblStruct>, depending on the reference
At the end of the file the user can add further templates to handle any subelement markup of elements used in the references which may require separate formatting. An example would be inline markup such as emphasis or foreign words, or (see the MODS driver file) the separation of a subtitle from the title by a colon and space.
These templates record in variables the style class (publisher, journal), reference type, element type name, and ID value of the current citation or reference element, for passing as parameters to the detection and formatting tem plates in the Tools file.
They then look up the relevant reference type in the formatting specification file and process the child elements (components) in document order.
For each end node (a component with no child components), BiblioX constructs a name string for use in key lookup (see §5.5, ‘Key files’ below), and tests for the presence of a key index.
If there is one, it looks up the key value[s] for the component and processes the data for output to HTML.
If there is no key index, or if there is one but it returns no values, the entire component is silently omitted.
The one file supplied with version 0.3 is called generic.xml. This implements a format of citation and reference which is generally acceptable as a default.
These files (one per DTD) instantiate the binding between the components in the formatting specification file and the element type names of the main document.
There must be exactly one key for every end node component
in each reference type in the formatting specification file,
with the exception of name components
(see below). The keys are named with a multi-part string shown
below. A utility stylesheet
genkeyfile.xsl can be used on a new
formatting specification file to generate skeleton
<xsl:key> elements with the required
Because all references have authors or editors (creators), and because they typically share a common structure within one DTD, it is only necessary to provide one set of keys for the subcomponents of name components. The names of these keys are formed as below, but with the reference type string omitted.
The format of key construction is
very precise and allows of no error: the
three attributes of
<name> attribute is a
pseudo-path representing two things:
the reference type (document class) of the reference;
the element path of the format components used,
starting at each element child of the
<reftype> in the formatting
specification file (ie the
author | editor | title | book | journal | volume | number | pages | uri | publisher | organisation | place | edition | date | isn
Mnemonically this can be described as:
Note carefully the underscore between the reference type and the path, and the hyphen used in the remainder of the path.
The .pos notation is reserved for
element types where it is essential to be able to
distinguish between single-author, dual-author, and
multi-author works for typographical purposes. The value
of .pos is
<match> is the XPath to the
element type in the main document containing the data to
be indexed, starting at the element type containing the
references in the main document.
<use> attribute identifies the
ID of the reference being formatted (in the main
document). If you don't use an ID here (many DTDs
unwisely forgo this safety check by making the attribute
optional) then it is up to some external processing to
make sure the names are unique.
The sample files contain working examples of key construction.
The file bibliotools.xsl contains the named templates which make the lookup and formatting work:
makepath constructs a path-like string representing a portion of the XPath to the current component in the formatting specification file, as a match for the names of the key indexes;
format does the work of interpreting and recursively processing child elements of components and passing the final result to the output named template for output;
output outputs the component values in the appropriate typographic style;
dig recursively investigates the content of the current component and any keys or subcomponents and returns a value representing how much information there is in there;
getparents recursively retrieves a node-set of the parent elements of all the key values that will be returned from keyed components of .middle names.
docitesbefore handles footnoted citations occurring in a structural division (chapter, section, subsection, etc) before the first occurrence of the relevant subdivision, eg citations in the text of a chapter before the start of the first section.
docitesafter handles footnoted citations occurring in a structural division which contains no further subdivisions.
This returns an XPath-like string representing the path to a formatting specification component which is used as the name of a key to look up the data. This is done by recursively prefixing the element type name of the component with the element type name of its parent, separated by a hyphen instead of a slash (it doesn't really matter if an element type name contains a real hyphen, since this string is used as a Qname not an element type name).
The ‘ceiling’ above which
makepath will not recurse is passed as
the ancestor parameter and defaults to
reftype in the format
template. The only other value passed for this is
style, used when referencing inline
styles, which the BiblioX DTD places before the
<reftype>s in a formatting specification
The component element type
treated as special: its position is detected (first, middle,
last: recall that all formatting specifications must have
three name specifications for this purpose), and the suffix
.last appended to the name token of
the output string. This is used in key lookups to extract
the relevant names of multi-authored documents, and in the
format template to detect middle-grouped
names for special handling (see item 4 in the list below).
This takes three parameters, as outlined in §5.3, ‘Driver files’ above:
the reference type (eg book, article, report, etc);
the ID of the bibliographic reference element, used to identify the key entries.
The limit of recursion for the makepath named template.
The sequence of processing is heavily documented in the comments in the file. In outline:
Output any preceding separation text if a) it is specified; b) if there is some text from the main document to output; and c) the component content is non-blank (eg a default value). This procedure is repeated at the end for any trailing separation text.
The main choice is between components which have child elements (which therefore require recursing into until a terminal is reached) and those which do not (which can be handed straight to the output formatter, dealt with below in §5.6.3, ‘output’ below).
For each child element, construct the key match string using the makepath named template as described in §5.6.1, ‘makepath’ above. If this is an inline citation, pass style as the ceiling value for recursion, otherwise pass reftype.
Locate the data value(s) for this component by trying to dereference the key name as a test of existence, accessing the Keys file as XML. If there is a matching key, use it to look up the data and store the result in a variable context. If there si no match, ignore and pass to the next component.
Count how many values were retrieved. If greater
than zero (or zero but the name of the component is
<suffix>), then convert the values
to a node-set and recursively call this template
again to process the children, unless…
…the exception is that if the current key
name ends with
.middle (added by
makepath) then we have a special
case: the container component
<name> for non-first, non-last
authors or editors of a multi-authored or
multi-edited document. If the child elements of this
component were processed as they stand, we would get
all forenames together followed by all surnames
together, because that's what the keys store.
Instead, we want the first forename and the first
surname, then the second forename and the second
Record the before and after separator values for later output (these would otherwise be bypassed because the element will not pass through the normal recursion of this template).
For each middle author identified by this key, output the before separator, record the node and its numeric position in the sequence, and pass them to the template in the same way as normal, but retrieving the node from the key separately and extracting the nth instance, rather than letting the normal child-handling code above do it, which would have resulted in retrieving all forenames or all surnames together.
This method is predicated on their being no further child components in a name subcomponent.
After drilling down to the point where there are no more child components, only terminal components, the format template calls this one. The output tree is populated via a vast nested mass of code to cope with all possible commutative combinations of the typographic attributes described in §5.1.4, ‘Typographical attributes’ above.
It would have been preferable to output multiple
<class> values in the manner of an IDREFS
attribute, but only very recent browsers seem to handle
this. At a later stage it can be rewritten that way.
This is a template to count the amount of text (both character data content and retrieved keys) in a component and its children. It is called from within a variable called volume within the format template, to see if there is any content to process.
It takes the same three parameters as format plus a recursive counter called keysfound to count the number of key values.
It processes all element children of the current node, making a pseudo-path using makepath, and uses it to test in the Keys file if there are any keys available. If so, they are counted, otherwise the count is zeroed.
Each child is then recursively processed in the same way and the number of keys added to the accumulated total, returned when all descendants have been exhausted.
At the moment, it fails to add the number of keys, outputting instead the catenation of the string values of the numbers. As the test for the presence of text in the volume variable, this doesn't matter, as it's only a test for something-or-nothing, but it's a bug that needs fixing.
This template is called from that part of format which handles .middle names. It recursively locates all components passed in the parameter children, and finds the parents of the indexed values from the main document.
In processing .first and .last names, we take the values retrieved from indexes as they come. But for (potentially multiple) .middle names, stepping through the children and outputting all values for the first, then all values for the second, etc is precisely what we don't want to do because it results (for example) in an author called PeterPaulaMary FlynnMurphyAxford.
In processing .middle names, we need to call a template which visits all the key-indexed values and returns a node-set of their OR'd parents, then goes through these in order, and within each parent, outputs the values identified by the components, thereby separating the retrieved values into their person-by-person order.
Actually, it probably ought to return a node-set of the Lowest Common Ancestors rather than the parents, but this is rather intractable.
This template should be called immediately at the start of all processing templates for structural divisions in document type transformations where footnoted citations are used, before any other output. There is one parameter: citname, which specifies the name of the citation-bearing element type.
It outputs footnoted citations that are descendants of the parent of the current structural division but which are not descendants of any instance of the current structural division. This limits it to those citations made in the body of the parent (eg the open part of a chapter before any subsections).
<xsl:template match="section|sect1|sect2|sect3"> <xsl:call-template name="docitesbefore"> <xsl:with-param name="citname"> <xsl:text>citation</xsl:text> </xsl:with-param> </xsl:call-template> <xsl:apply-templates/> <xsl:call-template name="docitesafter"> <xsl:with-param name="citeexclude" select="count(section|sect1|sect2)"/> <xsl:with-param name="citname"> <xsl:text>citation</xsl:text> </xsl:with-param> </xsl:call-template> </xsl:template>
This is the logical complement of docitesbefore: it outputs footnoted citations that are descendants of the current structural element but which are not descendants of any structural subdivisions.
It must be placed last in the template[s] which handle structural divisions. There are two parameters: citname (as for §5.6.6, ‘docitesbefore’ above) and citeexclude, which specifies the name of the next lower structuraL subdivision.
Currently one CSS file is supplied, bibliofile.css which implements the bold, italics, etc classes output by the format named template.
This section describes what an author or editor should expect to be able to do in DocBook and TEI documents to make citation work properly. I am assuming that the reader knows what citation and reference is about, and understands the need for accuracy and consistency in the markup of citations and the references they refer to. The section §3.2, ‘Terminology’ above explains some of this, and the examples below should be enough for the average XML user to understand what is happening. I do also assume that the reader is competent in using an XML editor with the DocBook or TEI DTDs, and in the use of XSLT with an XSLT processor to produce HTML files from their XML.
To use the default setup, the only things you need to do are:
Add this line to the top of your existing XSLT stylesheet:
Edit the file bibliox.xsl to make sure
that the variable keys is set to the correct Keys file name (docbookkeys.xsl, teikeys.xsl, or modskeys.xsl), according to which DTD you are using to store your References;
that the same Keys filename is used in the
<xsl:include> element immediate below
the declaration of the keys
that the correct matching Driver filename is
specified in the next
Make sure your citations are in the format (for DocBook):
or (for TEI):
where the value
abc123 is replaced by
the ID value of the relevant reference entry.
Make sure your references are accurately marked up in the appropriate place in the document (usually at the end). The IDs used in the citations must all exist among your references, otherwise XML will emit error messages. It is not, however, an XML error to have unused references (ones which are never cited).
Here is an example of a reference in DocBook format:
<biblioentry id="abc123" type="book"> <author> <surname>Flynn</surname> <firstname>Peter</firstname> </author> <title>Understanding SGML and XML Tools</title> <titleabbrev>SGML & XML Tools</titleabbrev> <publisher> <publishername>Kluwer</publishername> <address>Boston</address> </publisher> <isbn>0-7923-8169-6</isbn> <date YYYY-MM-DD="1998">1998</date> </biblioentry>
<biblioentry> elements must be
contained within a
Here is the same reference done in TEI format:
<biblFull id="abc123" rend="book"> <titleStmt> <title>Understanding SGML and XML Tools</title> <author> <persName> <foreName>Peter</foreName> <surname>Flynn</surname> </persName> </author> <respStmt> <name>http://imbolc.ucc.ie/~pflynn/books</name> </respStmt> </titleStmt> <extent>432</extent> <publicationStmt> <publisher>Kluwer Academic Publishers</publisher> <pubPlace>Boston</pubPlace> <idno type="isbn">0-7923-8169-6</idno> <date value="1998">1998</date> </publicationStmt> </biblFull>
In TEI, all
elements must be within a
Edit the file generic.xml and
change the value of the format attributre
<style> element to whatever you want
the default to be—one of:
number | abbrev | default | footnote | endnote | author | title | year | yearonly | author-year | author-title | author-title-year | title-year
Normally it will be
There is only one example of this file supplied with BiblioX, generic.xml. This implements a very basic and unexciting but usable set of formats for common everyday citations, such as you will find in the work of most academics, researchers, scientists, and documenters. It does not cater for the special needs of the professional bibliographer or librarian—although it could do so very easily: I just haven't written that much of it yet—nor does it provide some of the more outré formats required by smaller fields of research.
The structure of this file is shown in Figure 3, ‘Structural representation of the BiblioX DTD’ above. If you want to copy it to another filename and then take that version to pieces and reconstruct it to implement a publisher's or journal's house style, feel free to do so, just don't destroy or overwrite the actual file generic.xml. If you do create another version, you will need to change the value of the variable bibstyles in the bibliox.xsl file.
The recent (March 2004) changes to DocBook to allow the
<biblioref> element in
<citetitle> (and elsewhere) mean that the
full scope of citation and reference is now possible within
DocBook documents for the first time. This method has been
used in this version in preference to the previous method,
which was to add an IDREF attribute to the three citation
element types, as well as other metadata.
The documentation for DocBook(Walsh, 1999) says that these three element types have the following applications:
An inline bibliographic reference to another published work
A citation to a reference page
The title of a cited work
These reflect the very limited range of citations made in computer documentation, which is restricted very much to the occasional citation of external standards or a few articles or books; a lot of man pages; and the less formal passing mentions of titles of books. However, people increasingly want to use the rich markup available elsewhere in DocBook for
The three XML sample files included are for MODS, DocBook, and TEI. They have been processed with Saxon 6.5.2 and the BiblioX XSL to produce HTML.
They are not intended to be exhaustive, merely to demonstrate that it is possible to do this kind of formatting successfully using XSLT.
A huge amount remains to be done:
Many more examples are needed, preferably using real-life documents with their owners' permission.
We need some usability tests. MODS in particular concerns me because its terminology, while wholly accurate, is alien to non-librarians and non-archivists. By the same token I am aware that the terms I have used will be regarded with the same suspicion by librarians and archivists for being too concrete and printer/publisher oriented.
There is vast scope for the code to be tightened and improved. The objective of this pass was to get a working system, and that meant finding problems along the way and dealing with them as best could be done at the time.
There are also undoubtedly things I have done which will make XSLT experts shudder in horror, as there must be easier and cleaner ways of performing some of the actions required within the side-effect-free environment of XSLT.
Berger, Jens: ‘Extended BibTeX citation support for the humanities and legal texts’. [published by the author], Potsdam, Germany, 2002, [http://www.jurabib.org/].
Flynn, Peter: ‘BiblioX’. Silmaril Consultants, Cork, Ireland, 2004.
Hägglund, Sture and Roland Tibell: ‘Multi-style dialogues and control independence in interactive software’. In TRG Green, SJ Payne and GC van der Veer, The Psychology of Computer Use, Academic Press, London, 1983, pp 171–189, ISBN 0122974204.
Library of Congress' Network Development and MARC Standards Office: ‘Metadata Object Description Schema’. v.3, Library of Congress, Washington, DC, December 2003, [http://www.loc.gov/standards/mods/].
Patashnik, Oren: ‘BIBTeX’. TeX Users Group, Portland, OR, 1988, [http://www.ctan.org/tex-archive/biblio/bibtex/].
Walsh, Norman and Leonard Muellner: DocBook: The Definitive Guide. O'Reilly & Associates, Sebastopol, CA, 1999, 156592-580-7.
Version 1.2, November 2002
Copyright (C) 2000,2001,2002 Free Software Foundation, Inc. 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.
The purpose of this License is to make a manual, textbook, or other functional and useful document "free" in the sense of freedom: to assure everyone the effective freedom to copy and redistribute it, with or without modifying it, either commercially or noncommercially. Secondarily, this License preserves for the author and publisher a way to get credit for their work, while not being considered responsible for modifications made by others.
This License is a kind of "copyleft", which means that derivative works of the document must themselves be free in the same sense. It complements the GNU General Public License, which is a copyleft license designed for free software.
We have designed this License in order to use it for manuals for free software, because free software needs free documentation: a free program should come with manuals providing the same freedoms that the software does. But this License is not limited to software manuals; it can be used for any textual work, regardless of subject matter or whether it is published as a printed book. We recommend this License principally for works whose purpose is instruction or reference.
This License applies to any manual or other work, in any medium, that contains a notice placed by the copyright holder saying it can be distributed under the terms of this License. Such a notice grants a world-wide, royalty-free license, unlimited in duration, to use that work under the conditions stated herein. The "Document", below, refers to any such manual or work. Any member of the public is a licensee, and is addressed as "you". You accept the license if you copy, modify or distribute the work in a way requiring permission under copyright law.
A "Modified Version" of the Document means any work containing the Document or a portion of it, either copied verbatim, or with modifications and/or translated into another language.
A "Secondary Section" is a named appendix or a front-matter section of the Document that deals exclusively with the relationship of the publishers or authors of the Document to the Document's overall subject (or to related matters) and contains nothing that could fall directly within that overall subject. (Thus, if the Document is in part a textbook of mathematics, a Secondary Section may not explain any mathematics.) The relationship could be a matter of historical connection with the subject or with related matters, or of legal, commercial, philosophical, ethical or political position regarding them.
The "Invariant Sections" are certain Secondary Sections whose titles are designated, as being those of Invariant Sections, in the notice that says that the Document is released under this License. If a section does not fit the above definition of Secondary then it is not allowed to be designated as Invariant. The Document may contain zero Invariant Sections. If the Document does not identify any Invariant Sections then there are none.
The "Cover Texts" are certain short passages of text that are listed, as Front-Cover Texts or Back-Cover Texts, in the notice that says that the Document is released under this License. A Front-Cover Text may be at most 5 words, and a Back-Cover Text may be at most 25 words.
A "Transparent" copy of the Document means a machine-readable copy, represented in a format whose specification is available to the general public, that is suitable for revising the document straightforwardly with generic text editors or (for images composed of pixels) generic paint programs or (for drawings) some widely available drawing editor, and that is suitable for input to text formatters or for automatic translation to a variety of formats suitable for input to text formatters. A copy made in an otherwise Transparent file format whose markup, or absence of markup, has been arranged to thwart or discourage subsequent modification by readers is not Transparent. An image format is not Transparent if used for any substantial amount of text. A copy that is not "Transparent" is called "Opaque".
Examples of suitable formats for Transparent copies include plain ASCII without markup, Texinfo input format, LaTeX input format, SGML or XML using a publicly available DTD, and standard-conforming simple HTML, PostScript™ or PDF designed for human modification. Examples of transparent image formats include PNG, XCF and JPG. Opaque formats include proprietary formats that can be read and edited only by proprietary word processors, SGML or XML for which the DTD and/or processing tools are not generally available, and the machine-generated HTML, PostScript™ or PDF produced by some word processors for output purposes only.
The "Title Page" means, for a printed book, the title page itself, plus such following pages as are needed to hold, legibly, the material this License requires to appear in the title page. For works in formats which do not have any title page as such, "Title Page" means the text near the most prominent appearance of the work's title, preceding the beginning of the body of the text.
A section "Entitled XYZ" means a named subunit of the Document whose title either is precisely XYZ or contains XYZ in parentheses following text that translates XYZ in another language. (Here XYZ stands for a specific section name mentioned below, such as "Acknowledgements", "Dedications", "Endorsements", or "History".) To "Preserve the Title" of such a section when you modify the Document means that it remains a section "Entitled XYZ" according to this definition.
The Document may include Warranty Disclaimers next to the notice which states that this License applies to the Document. These Warranty Disclaimers are considered to be included by reference in this License, but only as regards disclaiming warranties: any other implication that these Warranty Disclaimers may have is void and has no effect on the meaning of this License.
You may copy and distribute the Document in any medium, either commercially or noncommercially, provided that this License, the copyright notices, and the license notice saying this License applies to the Document are reproduced in all copies, and that you add no other conditions whatsoever to those of this License. You may not use technical measures to obstruct or control the reading or further copying of the copies you make or distribute. However, you may accept compensation in exchange for copies. If you distribute a large enough number of copies you must also follow the conditions in section 3.
You may also lend copies, under the same conditions stated above, and you may publicly display copies.
If you publish printed copies (or copies in media that commonly have printed covers) of the Document, numbering more than 100, and the Document's license notice requires Cover Texts, you must enclose the copies in covers that carry, clearly and legibly, all these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on the back cover. Both covers must also clearly and legibly identify you as the publisher of these copies. The front cover must present the full title with all words of the title equally prominent and visible. You may add other material on the covers in addition. Copying with changes limited to the covers, as long as they preserve the title of the Document and satisfy these conditions, can be treated as verbatim copying in other respects.
If the required texts for either cover are too voluminous to fit legibly, you should put the first ones listed (as many as fit reasonably) on the actual cover, and continue the rest onto adjacent pages.
If you publish or distribute Opaque copies of the Document numbering more than 100, you must either include a machine-readable Transparent copy along with each Opaque copy, or state in or with each Opaque copy a computer-network location from which the general network-using public has access to download using public-standard network protocols a complete Transparent copy of the Document, free of added material. If you use the latter option, you must take reasonably prudent steps, when you begin distribution of Opaque copies in quantity, to ensure that this Transparent copy will remain thus accessible at the stated location until at least one year after the last time you distribute an Opaque copy (directly or through your agents or retailers) of that edition to the public.
It is requested, but not required, that you contact the authors of the Document well before redistributing any large number of copies, to give them a chance to provide you with an updated version of the Document.
You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above, provided that you release the Modified Version under precisely this License, with the Modified Version filling the role of the Document, thus licensing distribution and modification of the Modified Version to whoever possesses a copy of it. In addition, you must do these things in the Modified Version:
If the Modified Version includes new front-matter sections or appendices that qualify as Secondary Sections and contain no material copied from the Document, you may at your option designate some or all of these sections as invariant. To do this, add their titles to the list of Invariant Sections in the Modified Version's license notice. These titles must be distinct from any other section titles.
You may add a section Entitled "Endorsements", provided it contains nothing but endorsements of your Modified Version by various parties--for example, statements of peer review or that the text has been approved by an organization as the authoritative definition of a standard.
You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25 words as a Back-Cover Text, to the end of the list of Cover Texts in the Modified Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or through arrangements made by) any one entity. If the Document already includes a cover text for the same cover, previously added by you or by arrangement made by the same entity you are acting on behalf of, you may not add another; but you may replace the old one, on explicit permission from the previous publisher that added the old one.
The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity for or to assert or imply endorsement of any Modified Version.
You may combine the Document with other documents released under this License, under the terms defined in section 4 in the list above for modified versions, provided that you include in the combination all of the Invariant Sections of all of the original documents, unmodified, and list them all as Invariant Sections of your combined work in its license notice, and that you preserve all their Warranty Disclaimers.
The combined work need only contain one copy of this License, and multiple identical Invariant Sections may be replaced with a single copy. If there are multiple Invariant Sections with the same name but different contents, make the title of each such section unique by adding at the end of it, in parentheses, the name of the original author or publisher of that section if known, or else a unique number. Make the same adjustment to the section titles in the list of Invariant Sections in the license notice of the combined work.
In the combination, you must combine any sections Entitled "History" in the various original documents, forming one section Entitled "History"; likewise combine any sections Entitled "Acknowledgements", and any sections Entitled "Dedications". You must delete all sections Entitled "Endorsements".
You may make a collection consisting of the Document and other documents released under this License, and replace the individual copies of this License in the various documents with a single copy that is included in the collection, provided that you follow the rules of this License for verbatim copying of each of the documents in all other respects.
You may extract a single document from such a collection, and distribute it individually under this License, provided you insert a copy of this License into the extracted document, and follow this License in all other respects regarding verbatim copying of that document.
A compilation of the Document or its derivatives with other separate and independent documents or works, in or on a volume of a storage or distribution medium, is called an "aggregate" if the copyright resulting from the compilation is not used to limit the legal rights of the compilation's users beyond what the individual works permit. When the Document is included an aggregate, this License does not apply to the other works in the aggregate which are not themselves derivative works of the Document.
If the Cover Text requirement of section 3 is applicable to these copies of the Document, then if the Document is less than one half of the entire aggregate, the Document's Cover Texts may be placed on covers that bracket the Document within the aggregate, or the electronic equivalent of covers if the Document is in electronic form. Otherwise they must appear on printed covers that bracket the whole aggregate.
Translation is considered a kind of modification, so you may distribute translations of the Document under the terms of section 4. Replacing Invariant Sections with translations requires special permission from their copyright holders, but you may include translations of some or all Invariant Sections in addition to the original versions of these Invariant Sections. You may include a translation of this License, and all the license notices in the Document, and any Warranty Disclaimers, provided that you also include the original English version of this License and the original versions of those notices and disclaimers. In case of a disagreement between the translation and the original version of this License or a notice or disclaimer, the original version will prevail.
If a section in the Document is Entitled "Acknowledgements", "Dedications", or "History", the requirement (section 4) to Preserve its Title (section 1) will typically require changing the actual title.
You may not copy, modify, sublicense, or distribute the Document except as expressly provided for under this License. Any other attempt to copy, modify, sublicense or distribute the Document is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance.
The Free Software Foundation may publish new, revised versions of the GNU Free Documentation License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. See http://www.gnu.org/copyleft/.
Each version of the License is given a distinguishing version number. If the Document specifies that a particular numbered version of this License "or any later version" applies to it, you have the option of following the terms and conditions either of that specified version or of any later version that has been published (not as a draft) by the Free Software Foundation. If the Document does not specify a version number of this License, you may choose any version ever published (not as a draft) by the Free Software Foundation.
To use this License in a document you have written, include a copy of the License in the document and put the following copyright and license notices just after the title page:
Copyright (c) YEAR YOUR NAME. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".
If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts, replace the "with...Texts." line with this:
with the Invariant Sections being LIST THEIR TITLES, with the Front-Cover Texts being LIST, and with the Back-Cover Texts being LIST.
If you have Invariant Sections without Cover Texts, or some other combination of the three, merge those two alternatives to suit the situation.
If your document contains nontrivial examples of program code, we recommend releasing these examples in parallel under your choice of free software license, such as the GNU General Public License, to permit their use in free software.