XStandoff Examples

Re-ordering and virtual segments: Huey, Dewey, and Louie and the haiku

This example can be found — slighty changed — in the TEI (both P4 and P5). Huey, Dewey, and Louie try to remember a haiku, remembering the lines out of order. TEI supports virtual elements which reconstruct the poem in its normal order (e.g. in order that proximity searches on the words of the poem can find it, even though it is only virtually present in the document) — in this example by using the join element.

The text

[TXT]
How does it go?
da-da-da
gets a new frog
...
When the old pond
...
...
It's a new pond.

The TEI annotation

This annotation is slightly altered from the P5 version that can be found at http://www.tei-c.org/release/doc/tei-p5-doc/en/html/examples-speaker.html. Note the join element that uses its targets attribute to virtually re-order the lines to a resulting line group (lg) element.

<?xml version="1.0" encoding="UTF-8"?>
<text xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.tei-c.org/ns/1.0 ../xsd/tei_corpus.xsd">
  <body>
    <sp who="Hughie">
      <p>How does it go?
        <q>
          <l xml:id="frog-X1">da-da-da</l>
          <l xml:id="frog-L2">gets a new frog</l>
          <l>...</l>
        </q>
      </p>
    </sp>
    <sp who="Louie">
      <p>
        <q>
          <l xml:id="frog-L1">When the old pond</l>
          <l>...</l>
        </q>
      </p>
    </sp>
    <sp who="Dewey">
      <p>
        <q>
          <l>...</l>
          <l xml:id="frog-L3">It's a new pond.</l>
        </q>
      </p>
      <join targets="#frog-L1 #frog-L2 #frog-L3" result="lg" scope="root" type="haiku"/>
    </sp>
  </body>
</text>

The XStandoff instances

In this case we have at least two possible XStandoff realizations, called variant 1 and variant 2.

Variant 1
[XML] [SVG]

If we convert the TEI inline annotation to XStandoff nothing is changed so far — and nothing is won since there were neither overlaps nor other problematic annotation features present in the original annotation. In contrast, we've just blown up the file size.

<?xml version="1.0" encoding="UTF-8"?>
<xsf:corpusData xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns:xsf="http://www.xstandoff.net/2009/xstandoff/1.1"
  xsi:schemaLocation="http://www.xstandoff.net/2009/xstandoff/1.1
  http://www.xstandoff.net/2009/xstandoff/1.1/xsf.xsd"
  xsfVersion="1.1"
  xml:id="hughie_louie_dewey">
  <xsf:primaryData start="0" end="87">
    <xsf:primaryDataRef uri="../pd/huey.txt"/>
  </xsf:primaryData>
  <xsf:segmentation>
    <xsf:segment xml:id="seg1" start="0" end="87"/>
    <xsf:segment xml:id="seg2" start="0" end="45"/>
    <xsf:segment xml:id="seg3" start="16" end="45"/>
    <xsf:segment xml:id="seg4" start="16" end="24"/>
    <xsf:segment xml:id="seg5" start="25" end="40"/>
    <xsf:segment xml:id="seg6" start="41" end="44"/>
    <xsf:segment xml:id="seg7" start="45" end="67"/>
    <xsf:segment xml:id="seg8" start="45" end="62"/>
    <xsf:segment xml:id="seg9" start="63" end="66"/>
    <xsf:segment xml:id="seg10" start="67" end="87"/>
    <xsf:segment xml:id="seg11" start="67" end="70"/>
    <xsf:segment xml:id="seg12" start="71" end="87"/>
    <xsf:segment xml:id="seg13" start="87" end="87"/>
  </xsf:segmentation>
  <xsf:annotation>
    <xsf:level xml:id="Huey">
      <xsf:layer xmlns="http://www.tei-c.org/ns/1.0" priority="0"
        xsi:schemaLocation="http://www.tei-c.org/ns/1.0 ../xsd/tei_corpus.xsd">
        <text xsf:segment="seg1">
          <body xsf:segment="seg1">
            <sp who="Huey" xsf:segment="seg2">
              <p xsf:segment="seg2">
                <q xsf:segment="seg3">
                  <l xml:id="frog-X1" xsf:segment="seg4"/>
                  <l xml:id="frog-L2" xsf:segment="seg5"/>
                  <l xsf:segment="seg6"/>
                </q>
              </p>
            </sp>
            <sp who="Louie" xsf:segment="seg7">
              <p xsf:segment="seg7">
                <q xsf:segment="seg7">
                  <l xml:id="frog-L1" xsf:segment="seg8"/>
                  <l xsf:segment="seg9"/>
                </q>
              </p>
            </sp>
            <sp who="Dewey" xsf:segment="seg10">
              <p xsf:segment="seg10">
                <q xsf:segment="seg10">
                  <l xsf:segment="seg11"/>
                  <l xml:id="frog-L3" xsf:segment="seg12"/>
                </q>
              </p>
              <join targets="#frog-L1 #frog-L2 #frog-L3" result="lg" scope="root" type="haiku"
                xsf:segment="seg13"/>
            </sp>
          </body>
        </text>
      </xsf:layer>
    </xsf:level>
  </xsf:annotation>
</xsf:corpusData>
Variant 2
[XML] [SVG]

XStandoff supports discontinuous segments which can be used to re-order parts of the original text (cf. the Alice in Wonderland example). Together with the possibility to include meta information underneath the respective segment element we can provide additional information — such as a more human-readable description and the original TEI's join element.

<?xml version="1.0" encoding="UTF-8"?>
<xsf:corpusData xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns:xsf="http://www.xstandoff.net/2009/xstandoff/1.1"
  xsi:schemaLocation="http://www.xstandoff.net/2009/xstandoff/1.1
  http://www.xstandoff.net/2009/xstandoff/1.1/xsf.xsd"
  xsfVersion="1.1" xml:id="hughie_louie_dewey">
  <xsf:primaryData start="0" end="87">
    <xsf:primaryDataRef uri="../pd/huey.txt"/>
  </xsf:primaryData>
  <xsf:segmentation>
    <xsf:segment xml:id="seg1" start="0" end="87"/>
    <xsf:segment xml:id="seg2" start="0" end="45"/>
    <xsf:segment xml:id="seg3" start="16" end="45"/>
    <xsf:segment xml:id="seg4" start="16" end="24"/>
    <xsf:segment xml:id="seg5" start="25" end="40"/>
    <xsf:segment xml:id="seg6" start="41" end="44"/>
    <xsf:segment xml:id="seg7" start="45" end="67"/>
    <xsf:segment xml:id="seg8" start="45" end="62"/>
    <xsf:segment xml:id="seg9" start="63" end="66"/>
    <xsf:segment xml:id="seg10" start="67" end="87"/>
    <xsf:segment xml:id="seg11" start="67" end="70"/>
    <xsf:segment xml:id="seg12" start="71" end="87"/>
    <xsf:segment xml:id="seg13" segments="seg8 seg5 seg12" mode="disjoint">
      <xsf:meta>
        <join xmlns="http://www.tei-c.org/ns/1.0" targets="#frog-L1 #frog-L2 #frog-L3"
          result="lg" scope="root" type="haiku"/>
        <olac:olac xmlns:olac="http://www.language-archives.org/OLAC/1.0/"
          xmlns="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/"
          xsi:schemaLocation="http://www.language-archives.org/OLAC/1.0/ 
          ../xsd/meta/olac.xsd">
            <creator>Maik Stührenberg</creator>
            <date>2010-09-22</date>
            <description>This manually created join results in the correct form of the haiku.</description>
        </olac:olac>
      </xsf:meta>
    </xsf:segment>
  </xsf:segmentation>
  <xsf:annotation>
    <xsf:level xml:id="hughie_louie_dewey-level1">
      <xsf:layer xmlns="http://www.tei-c.org/ns/1.0" priority="0"
        xsi:schemaLocation="http://www.tei-c.org/ns/1.0 ../xsd/tei_corpus.xsd">
        <text xsf:segment="seg1">
          <body xsf:segment="seg1">
            <sp who="Huey" xsf:segment="seg2">
              <p xsf:segment="seg2">
                <q xsf:segment="seg3">
                  <l xml:id="frog-X1" xsf:segment="seg4"/>
                  <l xml:id="frog-L2" xsf:segment="seg5"/>
                  <l xsf:segment="seg6"/>
                </q>
              </p>
            </sp>
            <sp who="Louie" xsf:segment="seg7">
              <p xsf:segment="seg7">
                <q xsf:segment="seg7">
                  <l xml:id="frog-L1" xsf:segment="seg8"/>
                  <l xsf:segment="seg9"/>
                </q>
              </p>
            </sp>
            <sp who="Dewey" xsf:segment="seg10">
              <p xsf:segment="seg10">
                <q xsf:segment="seg10">
                  <l xsf:segment="seg11"/>
                  <l xml:id="frog-L3" xsf:segment="seg12"/>
                </q>
              </p>
            </sp>
          </body>
        </text>
      </xsf:layer>
    </xsf:level>
  </xsf:annotation>
</xsf:corpusData>

Note that this resulting XStandoff instance was modified by hand: TEI's join element and the respective segment[@xml:id='seg13'] were manually deleted and the new segment[@xml:id='seg13'] was created together with the corresponding meta data.