Collections of IGT

class pyigt.Corpus(igts, fname=None, clean_lexical_concept=None)[source]

A Corpus is an immutable, ordered list of IGT instances.

It provides access to concordance-like aggregated statistics of its texts.

Variables:: monolingual – Flag signaling whether the corpus is monolingual or contains IGT from different object languages.
Parameters:: igts (collections.abc.Iterable[pyigt.igt.IGT]) –

property grammar: dict[str, list[pyigt.igt.MorphemeReference]]

Maps grammatical concepts to lists of occurrences.

>>> from pyigt import Corpus, IGT
>>> igt = IGT(phrase="ni-c-chihui-lia in no-piltzin ce calli",
...           gloss="1SG.SUBJ-3SG.OBJ-mach-APPL DET 1SG.POSS-Sohn ein Haus")
>>> c = Corpus([igt])
>>> [[c[ref] for ref in c.grammar[k]] for k in c.grammar if k.startswith('1SG')]
[[<GlossedMorpheme morpheme=ni gloss=1SG.SUBJ>],
 [<GlossedMorpheme morpheme=no gloss=1SG.POSS>]]

property lexicon: dict[str, list[pyigt.igt.MorphemeReference]]

Maps lexical concepts to lists of occurrences.

>>> from pyigt import Corpus, IGT
>>> igt = IGT(phrase="ni-c-chihui-lia in no-piltzin ce calli",
...           gloss="1SG.SUBJ-3SG.OBJ-mach-APPL DET 1SG.POSS-Sohn ein Haus")
>>> c = Corpus([igt])
>>> [c[ref] for ref in c.lexicon['Sohn']]
[<GlossedMorpheme morpheme=piltzin gloss=Sohn>]

property form: dict[str, list[pyigt.igt.MorphemeReference]]

Maps grammatical concepts to lists of occurrences.

>>> from pyigt import Corpus, IGT
>>> igt = IGT(phrase="ni-c-chihui-lia in no-piltzin ce calli",
...           gloss="1SG.SUBJ-3SG.OBJ-mach-APPL DET 1SG.POSS-Sohn ein Haus")
>>> c = Corpus([igt])
>>> [k for k in c.form]
['ni', 'c', 'chihui', 'lia', 'in', 'no', 'piltzin', 'ce', 'calli']

classmethod from_cldf(cldf)[source]

Instantiate a corpus of IGT examples from a CLDF dataset.

Parameters:

cldf (pycldf.dataset.Dataset) – a pycldf.Dataset instance.
spec – a CorpusSpec instance, specifying how to interpret markup in the corpus.

Return type:

pyigt.igt.Corpus

classmethod from_path(path)[source]

Instantiate a corpus from a file path.

Parameters:: path (typing.Union[str, pathlib.Path]) – Either a path to a CLDF dataset’s metadata file or to a CLDF Examples component as CSV file. Note that in the latter case, the file must use the default column names, as defined in the CLDF ontology.
Return type:: pyigt.igt.Corpus

write_concordance(ctype, filename=None)[source]

Parameters:

ctype (typing.Literal['grammar', 'lexicon', 'form']) – lexicon or grammar or form.
filename (typing.Union[str, pathlib.Path, None]) –

write_concepts(ctype, filename=None)[source]

Parameters:

ctype (typing.Literal['grammar', 'lexicon', 'form']) – lexicon or grammar.
filename (typing.Union[str, pathlib.Path, None]) –

check_glosses(level=2)[source]: Check alignment of glosses on word and morpheme level.

get_wordlist(doculect='base', profile=None, lingpy_settings=LingPySettings(ref='crossid', lexstat=True, threshold=0.4))[source]

Return a classical wordlist from the data.

Parameters:

doculect (str) –
profile (typing.Union[pathlib.Path, str, segments.profile.Profile, None]) –
lingpy_settings (pyigt.igt.LingPySettings) –

get_profile(clts=None, filename=None)[source]

Compute an orthography profile with LingPy’s function.

Parameters:: filename – Write the computed profile to a file in addition to returning it.
Return type:: segments.profile.Profile
Returns:: segments.Profile instance.