XStandoff Examples

Linguistic Ambiguity

A typical example of linguistic ambiguity is the attachment of a prepositional phrase (PP-attachment). Consider the following German sentence:

The text

[TXT]

Der Mann sah die Frau mit dem Fernglas.
The man saw the woman with the spyglass.

The phrase "mit dem Fernglas" (with the spyglass) is ambiguous because there are two possible readings: either the man uses a spyglass as an instrument to see the woman or the man sees a woman carrying a spyglass. Usually this ambiguity occurs when the sequence VP (verbal phrase), NP (noun phrase) PP (prepositional phrase) occurs. Therefore we have two possible annotations on the same linguistic level (phrase structure): one in which the prepositional phrase is attached to the verbal phrase (we will use a second VP as common parent node in this case in our example) and a second in which the phrase is attached to the noun phrase describing the woman (in this case we will use a second NP as common parent node).

The annotation levels

First annotation layer: phrase structure reading one (the man uses the spyglass to see the woman)

<?xml version="1.0" encoding="UTF-8"?>
<S xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.xstandoff.net/linguistic_example/pos ../xsd/grammar.xsd"
  xmlns="http://www.xstandoff.net/linguistic_example/pos">
  <NP>
    <Det>Der</Det>
    <N>Mann</N>
  </NP>
  <VP>
    <VP>
      <V>sah</V>
      <NP>
        <Det>die</Det>
        <N>Frau</N>
      </NP>
    </VP>
    <PP>
      <P>mit</P>
      <NP>
        <Det>dem</Det>
        <N>Fernglas</N>
      </NP>
    </PP>
  </VP>.
</S>

A graphical representation is shown below:
Tree representation of the first possible reading

Second annotation layer: phrase structure reading two (the woman carries a spyglass)

<?xml version="1.0" encoding="UTF-8"?>
<S xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.xstandoff.net/linguistic_example/pos ../xsd/grammar.xsd"
  xmlns="http://www.xstandoff.net/linguistic_example/pos">
  <NP>
    <Det>Der</Det>
    <N>Mann</N>
  </NP>
  <VP>
    <V>sah</V>
    <NP>
      <NP>
        <Det>die</Det>
        <N>Frau</N>
      </NP>
      <PP>
        <P>mit</P>
        <NP>
          <Det>dem</Det>
          <N>Fernglas</N>
        </NP>
      </PP>
    </NP>
  </VP>.
</S>

A graphical representation is shown below:
Tree representation of the second possible reading

Note: We've chosen binary production rules (i.e. VP → VP, PP), other rules (which could be more adequate from a linguistic point of view) are possible as well.

The XStandoff instance

[XML] [SVG] [X3D]

The resulting XStandoff instance containing both possible readings underneath the same annotation level.

<?xml version="1.0" encoding="UTF-8"?>
<xsf:corpusData xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns:xsf="http://www.xstandoff.net/2009/xstandoff/1.1" xsfVersion="1.1"
  xml:id="linguistic_example_pos-1-linguistic_example_pos-2"
  xsi:schemaLocation="http://www.xstandoff.net/2009/xstandoff/1.1 
  http://www.xstandoff.net/2009/xstandoff/1.1/xsf.xsd">
  <xsf:primaryData start="0" end="39">
    <xsf:primaryDataRef uri="../pd/ambiguity.txt"/>
  </xsf:primaryData>
  <xsf:segmentation>
    <xsf:segment xml:id="seg1" start="0" end="39"/>
    <xsf:segment xml:id="seg2" start="0" end="9"/>
    <xsf:segment xml:id="seg3" start="0" end="3"/>
    <xsf:segment xml:id="seg4" start="4" end="8"/>
    <xsf:segment xml:id="seg5" start="9" end="38"/>
    <xsf:segment xml:id="seg6" start="9" end="22"/>
    <xsf:segment xml:id="seg7" start="9" end="12"/>
    <xsf:segment xml:id="seg8" start="13" end="38"/>
    <xsf:segment xml:id="seg9" start="13" end="22"/>
    <xsf:segment xml:id="seg10" start="13" end="16"/>
    <xsf:segment xml:id="seg11" start="17" end="21"/>
    <xsf:segment xml:id="seg12" start="22" end="38"/>
    <xsf:segment xml:id="seg13" start="22" end="25"/>
    <xsf:segment xml:id="seg14" start="26" end="38"/>
    <xsf:segment xml:id="seg15" start="26" end="29"/>
    <xsf:segment xml:id="seg16" start="30" end="38"/>
  </xsf:segmentation>
  <xsf:annotation>
    <xsf:level xml:id="linguistic_example_pos-1-level1">
      <xsf:layer xmlns="http://www.xstandoff.net/linguistic_example/pos" priority="0"
        xsi:schemaLocation="http://www.xstandoff.net/linguistic_example/pos ../xsd/grammar.xsd">
        <S xsf:segment="seg1">
          <NP xsf:segment="seg2">
            <Det xsf:segment="seg3"/>
            <N xsf:segment="seg4"/>
          </NP>
          <VP xsf:segment="seg5">
            <VP xsf:segment="seg6">
              <V xsf:segment="seg7"/>
              <NP xsf:segment="seg9">
                <Det xsf:segment="seg10"/>
                <N xsf:segment="seg11"/>
              </NP>
            </VP>
            <PP xsf:segment="seg12">
              <P xsf:segment="seg13"/>
              <NP xsf:segment="seg14">
                <Det xsf:segment="seg15"/>
                <N xsf:segment="seg16"/>
              </NP>
            </PP>
          </VP>
        </S>
      </xsf:layer>
      <xsf:layer xmlns="http://www.xstandoff.net/linguistic_example/pos" priority="0"
        xsi:schemaLocation="http://www.xstandoff.net/linguistic_example/pos ../xsd/grammar.xsd">
        <S xsf:segment="seg1">
          <NP xsf:segment="seg2">
            <Det xsf:segment="seg3"/>
            <N xsf:segment="seg4"/>
          </NP>
          <VP xsf:segment="seg5">
            <V xsf:segment="seg7"/>
            <NP xsf:segment="seg8">
              <NP xsf:segment="seg9">
                <Det xsf:segment="seg10"/>
                <N xsf:segment="seg11"/>
              </NP>
              <PP xsf:segment="seg12">
                <P xsf:segment="seg13"/>
                <NP xsf:segment="seg14">
                  <Det xsf:segment="seg15"/>
                  <N xsf:segment="seg16"/>
                </NP>
              </PP>
            </NP>
          </VP>
        </S>
      </xsf:layer>
    </xsf:level>
  </xsf:annotation>
</xsf:corpusData>

The XStandoff instance including an all layer

[XML]

When using XStandoff's experimental all layer the resulting XStandoff instance cannot be validated any longer since partial trees are excluded from the separated POS layers.
Reconstructing the original trees is complicated, too and has to be done via computing the segment spans.

<?xml version="1.0" encoding="UTF-8"?>
<xsf:corpusData xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns:xsf="http://www.xstandoff.net/2009/xstandoff/1.1" xsfVersion="1.1"
  xml:id="linguistic_example_pos-1-linguistic_example_pos-2"
  xsi:schemaLocation="http://www.xstandoff.net/2009/xstandoff/1.1
  http://www.xstandoff.net/2009/xstandoff/1.1/xsf.xsd">
  <xsf:primaryData start="0" end="39">
    <xsf:primaryDataRef uri="../pd/ambiguity.txt"/>
  </xsf:primaryData>
  <xsf:segmentation>
    <xsf:segment xml:id="seg1" start="0" end="39"/>
    <xsf:segment xml:id="seg2" start="0" end="9"/>
    <xsf:segment xml:id="seg3" start="0" end="3"/>
    <xsf:segment xml:id="seg4" start="4" end="8"/>
    <xsf:segment xml:id="seg5" start="9" end="38"/>
    <xsf:segment xml:id="seg6" start="9" end="22"/>
    <xsf:segment xml:id="seg7" start="9" end="12"/>
    <xsf:segment xml:id="seg8" start="13" end="38"/>
    <xsf:segment xml:id="seg9" start="13" end="22"/>
    <xsf:segment xml:id="seg10" start="13" end="16"/>
    <xsf:segment xml:id="seg11" start="17" end="21"/>
    <xsf:segment xml:id="seg12" start="22" end="38"/>
    <xsf:segment xml:id="seg13" start="22" end="25"/>
    <xsf:segment xml:id="seg14" start="26" end="38"/>
    <xsf:segment xml:id="seg15" start="26" end="29"/>
    <xsf:segment xml:id="seg16" start="30" end="38"/>
  </xsf:segmentation>
  <xsf:annotation>
    <xsf:level xml:id="linguistic_example_pos-1-level">
      <xsf:layer xmlns:all="http://www.xstandoff.net/2009/all" priority="0">
        <all:S xsf:segment="seg1">
          <all:NP xsf:segment="seg2">
            <all:Det xsf:segment="seg3"/>
            <all:N xsf:segment="seg4"/>
          </all:NP>
          <all:VP xsf:segment="seg5"/>
          <all:PP xsf:segment="seg12">
            <all:P xsf:segment="seg13"/>
            <all:NP xsf:segment="seg14">
              <all:Det xsf:segment="seg15"/>
              <all:N xsf:segment="seg16"/>
            </all:NP>
          </all:PP>
        </all:S>
      </xsf:layer>
      <xsf:layer xmlns="http://www.xstandoff.net/linguistic_example/pos" priority="0">
        <VP xsf:segment="seg6">
          <V xsf:segment="seg7"/>
          <NP xsf:segment="seg9">
            <Det xsf:segment="seg10"/>
            <N xsf:segment="seg11"/>
          </NP>
        </VP>
      </xsf:layer>
      <xsf:layer xmlns="http://www.xstandoff.net/linguistic_example/pos" priority="0">
        <V xsf:segment="seg7"/>
        <NP xsf:segment="seg8">
          <NP xsf:segment="seg9">
            <Det xsf:segment="seg10"/>
            <N xsf:segment="seg11"/>
          </NP>
        </NP>
      </xsf:layer>
    </xsf:level>
  </xsf:annotation>
</xsf:corpusData>