Interlinear Glossed Text

The main interface to access interlinear glossed text with pyigt is the IGT class, representing a single glossed phrase.

class pyigt.IGT(phrase, gloss, id=None, properties=_Nothing.NOTHING, language=None, translation=None, abbrs=_Nothing.NOTHING, strict=False)[source]

The main trait of IGT is the alignment of words and glosses. Thus, we are mostly interested in the two aligned “lines”: the analyzed text and the glosses, rather than trying to support any number of tiers, and alignment based on timestamps or similar. Thus, an IGT instance is a list of aligned words, and each aligned word a list of aligned morphemes. This structure can be exploited to access parts of the alignment, see IGT.__getitem__()

Variables:

phrase – list of str representing the gloss-aligned words of the IGT.
gloss – list of str representing the word-aligned glosses of the IGT.
id – Optional identifier, can be used for referencing the IGT if it part of a Corpus.
properties – typing.Dict[str, object] storing additional properties of an IGT, e.g. additional column values read from a row in a CLDF ExampleTable.
language – Optional language identifier, specifying the object language of the IGT.
translation – Optional translation of the phrase.
abbrs – Optional dict providing descriptions of gloss labels used in the IGT.
strict – bool flag signaling whether to parse the IGT in strict mode, i.e. requiring matching morpheme separators in phrase and gloss, or not.

Note

LGR Conformance

While the main purpose of an IGT is providing access to its words, morphemes and glosses, it also supports error/conformance checking. Thus, it is possible to initialize an IGT with “broken” data.

>>> from pyigt import IGT
>>> igt = IGT(phrase='two words', gloss='ONE.GLOSS')
>>> igt.conformance
<LGRConformance.UNALIGNED: 0>

So before processing IGT instances, it should be checked whether the conformance level (see LGRConformance) of the IGT is sufficient for the downstream requirements. Otherwise, accessing properties like IGT.glossed_words() may lead to unexpected results:

>>> igt.glossed_words  # we extract as many glossed words as possible ...
[<GlossedWord word=a gloss=C>]
>>> len(igt)
1
>>> len(igt.phrase)
2
>>> igt = IGT(phrase='multi-morph', gloss='GLOSS')
>>> igt.conformance
<LGRConformance.WORD_ALIGNED: 1>
>>> igt[0].glossed_morphemes  # we extract as many glossed morphemes as possible ...
[<GlossedMorpheme morpheme=multi gloss=GLOSS>]

property prosodic_words: List[GlossedWord]

Interpret an IGT’s phrase prosodically, i.e.

splits prosodically free elements marked with “ -” separator and
conflates clitics.

Use IGT.as_prosodic() to get an IGT instance initialised from the prosodic words of an IGT instance.

property morphosyntactic_words: List[GlossedWord]

Interpret an IGT’s phrase morphosyntactically, i.e.

conflate prosodically free elements marked with “ -” separator and
split clitics into separate words.

Use IGT.as_morphosyntactic() to get an IGT instance initialised from the morphosyntactic words of an IGT instance.

as_prosodic()[source]

>>> from pyigt import IGT
>>> igt = IGT(phrase='a=bcd -e', gloss='A=BCD-E')
>>> len(igt) != len(igt.as_prosodic())
True
>>> igt[0].word
'a=bcd -e'
>>> igt.as_prosodic()[0].word
'a=bcd'

Return type:: pyigt.igt.IGT

as_morphosyntactic()[source]

>>> from pyigt import IGT
>>> igt = IGT(phrase='a=bcd -e', gloss='A=BCD-E')
>>> len(igt) != len(igt.as_morphosyntactic())
True
>>> igt[0].word
'a=bcd -e'
>>> igt.as_morphosyntactic()[-1].word
'bcd -e'

__getitem__(i)[source]

Provide access to GlossedWord or GlossedMorpheme (s) by zero-based index.

Parameters:: i (typing.Union[int, typing.Tuple[int, typing.Union[int, slice]]]) – An int index to reference a GlossedWord or a (int, int) tuple, referencing a GlossedMorpheme.
Return type:: typing.Union[typing.List, pyigt.lgrmorphemes.GlossedWord, pyigt.lgrmorphemes.GlossedMorpheme]

>>> from pyigt import IGT
>>> igt = IGT(phrase="zəp-le: ȵi-ke: pe-ji qeʴlotʂu-ʁɑ,", gloss="a-DEF b-IN c-CSM d-LO")
>>> igt[0].word
'zəp-le:'
>>> [gw.word for gw in igt[2:]]
['pe-ji', 'qeʴlotʂu-ʁɑ,']
>>> str(igt[0, 0].morpheme)
'zəp'
>>> [str(gm.morpheme) for gm in igt[1, 0:]]  # All morphemes of the second word
['ȵi', 'ke:']
>>> [str(gm.morpheme) for gm in igt[0:, 0]]  # First morpheme in each word
['zəp', 'ȵi', 'pe', 'qeʴlotʂu']

property conformance: LGRConformance: Alignment level of the IGT.

check(strict=False, verbose=False)[source]

Parameters:

strict (bool) – If True, also check Rule 2: Morpheme-by-morpheme correspondence.
verbose (bool) –

property primary_text: str: The primary text of the IGT, i.e. the phrase stripped off morpheme separators.

class pyigt.LGRConformance(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Conformance levels with respect to alignment of phrase and gloss of an IGT.

We distinguish the following levels:

morpheme-aligned (IGT conforms to LGR Rule 2)
word-aligned (IGT conforms to LGR Rule 1, but not Rule 2)
unaligned (IGT does not conform to LGR Rule 1)

class pyigt.Example(dataset, row)[source]

A custom object class to use with pycldf.orm

This class overwrite the pycldf.orm.Example.igt property to return an IGT instance rather than a text string.

>>> from pyigt import Example
>>> from pycldf import Dataset
>>> ds = Dataset.from_metadata('tests/fixtures/lgr/cldf/Generic-metadata.json')
>>> ex = ds.objects('ExampleTable', cls=Example)
>>> ex['2'].igt.gloss[1]
'they-OBL-GEN'
>>> ex['2'].igt.gloss_abbrs["OBL"]
'oblique'

Parameters:: row (dict) –