Concurrent markup
Whenever we deal with multiple annotations, the problem of overlapping markup may arise. There are already a couple of approaches, such as TEI's milestones and fragments, LMNL, TexMECS, or XConcur (see the references page for further details). This page deals with the XStandoff approach.
XStandoff in a glimpse
- Notation
- XStandoff uses the XML notation, that is, all XStandoff instances are well-formed in the sense of the XML spec.
- Model
- The formal model of XStandoff ranges from a multi-rooted tree up to GODDAG (general ordered-descendant directed acyclic graph, see Sperberg-McQueen and Huitfeldt 1999) and supports discontinuous elements, multiple parenthood and differentiation between dominance and containment.
- Validation
- All XStandoff instances are valid XML instances. Each
annotation layer that is contained in an XStandoff
instance may be validated against an XSD document grammar (note that
only XSD 1.0 and 1.1 are supported, since both DTD and RELAX NG do not support
all features that are used in XStandoff). Cross-layer
validation is possible as well.
Metadata as part of an XStandoff instance may be validated according to the same rules.
A very brief overview of XStandoff 1
The XStandoff format (formerly known as Sekimo Generic Format, SGF) is an XML-based (meta) markup language
developed during the Sekimo project at Bielefeld University.
It allows for the storage and analysis of multi-dimensional (and possibly
overlapping) annotations.
The X in XStandoff
stands both for eXtended and eXtensible, since the format is an
extension to the classic standoff approach and is modularly designed. XStandoff is based on the formal model of a multi-rooted tree
(i.e. a set of trees span over the same primary data) – at least if one restricts
oneself to not using discontinuous segments, in that case the formal model tends to
be an rGODDAG – and is capable of using tree- and graph-based annotation
models.
The following image shows an overview of an XStandoff instance. Note, that the root element of an XStandoff instance is either the corpus
or
corpusData
element.

XStandoff (like other standoff formats) uses the character positions of the primary data to depict the positions where annotation elements occur, cf. the following listing:
T h e s u n s h i n e s b r i g h t e r . 00|01|02|03|04|05|06|07|08|09|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24
The character 'T' ranges from position 0 to 1, the character 'h' from 1 to 2, and so
on. This information is used during the constructing of an XStandoff instance.
In contrast to other serialization formats for
multi-dimensional and possible overlapping markup XStandoff
tries to conserve as much of the former annotation format as possible. For example
instances visit the Examples page, for a detailed
overview visit the Description page.
A very brief overview of XStandoff 2
XStandoff 2.0 and 2.1 introduce some changes to version 1.0.
Apart from supporting spatial markables over multimodal documents, some elements and
attributes have been renamed, removed or added (e.g., the corpus
element has been removed, instead recursively nested. Note, that version 2.0 was
never released as final version, use version 2.1 instead.

For establishing spatial markables, the segment
element's attributes
have been extended by adding the shape
and coords
attributes.