Concurrent markup

Whenever we deal with multiple annotations, the problem of overlapping markup may arise. There are already a couple of approaches, such as TEI's milestones and fragments, LMNL, TexMECS, or XConcur (see the references page for further details). This page deals with the XStandoff approach.

XStandoff in a glimpse

XStandoff uses the XML notation, that is, all XStandoff instances are well-formed in the sense of the XML spec.
The formal model of XStandoff ranges from a multi-rooted tree up to GODDAG (general ordered-descendant directed acyclic graph, see Sperberg-McQueen and Huitfeldt 1999) and supports discontinuous elements, multiple parenthood and differentiation between dominance and containment.
All XStandoff instances are valid XML instances. Each annotation layer that is contained in an XStandoff instance may be validated against an XSD document grammar (note that only XSD 1.0 and 1.1 are supported, since both DTD and RELAX NG do not support all features that are used in XStandoff). Cross-layer validation is possible as well.
Metadata as part of an XStandoff instance may be validated according to the same rules.

A very brief overview of XStandoff 1

The XStandoff format (formerly known as Sekimo Generic Format, SGF) is an XML-based (meta) markup language developed during the Sekimo project at Bielefeld University. It allows for the storage and analysis of multi-dimensional (and possibly overlapping) annotations.
The X in XStandoff stands both for eXtended and eXtensible, since the format is an extension to the classic standoff approach and is modularly designed. XStandoff is based on the formal model of a multi-rooted tree (i.e. a set of trees span over the same primary data) – at least if one restricts oneself to not using discontinuous segments, in that case the formal model tends to be an rGODDAG – and is capable of using tree- and graph-based annotation models.
The following image shows an overview of an XStandoff instance. Note, that the root element of an XStandoff instance is either the corpus or corpusData element.

XStandoff Overview

XStandoff (like other standoff formats) uses the character positions of the primary data to depict the positions where annotation elements occur, cf. the following listing:

  T  h  e     s  u  n     s  h  i  n  e  s     b  r  i  g  h  t  e  r  .

The character 'T' ranges from position 0 to 1, the character 'h' from 1 to 2, and so on. This information is used during the constructing of an XStandoff instance.
In contrast to other serialization formats for multi-dimensional and possible overlapping markup XStandoff tries to conserve as much of the former annotation format as possible. For example instances visit the Examples page, for a detailed overview visit the Description page.

A very brief overview of XStandoff 2

XStandoff 2.0 and 2.1 introduce some changes to version 1.0. Apart from supporting spatial markables over multimodal documents, some elements and attributes have been renamed, removed or added (e.g., the corpus element has been removed, instead recursively nested. Note, that version 2.0 was never released as final version, use version 2.1 instead.

XStandoff Overview

For establishing spatial markables, the segment element's attributes have been extended by adding the shape and coords attributes.