Collections of IGT

class pyigt.Corpus(igts, fname=None, clean_lexical_concept=None)[source]

A Corpus is an immutable, ordered list of IGT instances.

It provides access to concordance-like aggregated statistics of its texts.

Variables:

monolingual – Flag signaling whether the corpus is monolingual or contains IGT from different object languages.

Parameters:

igts (typing.Iterable[pyigt.igt.IGT]) –

property grammar: Dict[str, List[Tuple[int, int, int]]]

Maps grammatical concepts to lists of occurrences.

>>> from pyigt import Corpus, IGT
>>> igt = IGT(phrase="ni-c-chihui-lia in no-piltzin ce calli",
...           gloss="1SG.SUBJ-3SG.OBJ-mach-APPL DET 1SG.POSS-Sohn ein Haus")
>>> c = Corpus([igt])
>>> [[c[ref] for ref in c.grammar[k]] for k in c.grammar if k.startswith('1SG')]
[[<GlossedMorpheme morpheme=ni gloss=1SG.SUBJ>],
 [<GlossedMorpheme morpheme=no gloss=1SG.POSS>]]
property lexicon: Dict[str, List[Tuple[int, int, int]]]

Maps lexical concepts to lists of occurrences.

>>> from pyigt import Corpus, IGT
>>> igt = IGT(phrase="ni-c-chihui-lia in no-piltzin ce calli",
...           gloss="1SG.SUBJ-3SG.OBJ-mach-APPL DET 1SG.POSS-Sohn ein Haus")
>>> c = Corpus([igt])
>>> [c[ref] for ref in c.lexicon['Sohn']]
[<GlossedMorpheme morpheme=piltzin gloss=Sohn>]
property form: Dict[str, List[Tuple[int, int, int]]]

Maps grammatical concepts to lists of occurrences.

>>> from pyigt import Corpus, IGT
>>> igt = IGT(phrase="ni-c-chihui-lia in no-piltzin ce calli",
...           gloss="1SG.SUBJ-3SG.OBJ-mach-APPL DET 1SG.POSS-Sohn ein Haus")
>>> c = Corpus([igt])
>>> [k for k in c.form]
['ni', 'c', 'chihui', 'lia', 'in', 'no', 'piltzin', 'ce', 'calli']
classmethod from_cldf(cldf)[source]

Instantiate a corpus of IGT examples from a CLDF dataset.

Parameters:
  • cldf (pycldf.dataset.Dataset) – a pycldf.Dataset instance.

  • spec – a CorpusSpec instance, specifying how to interpret markup in the corpus.

Return type:

pyigt.igt.Corpus

classmethod from_path(path)[source]

Instantiate a corpus from a file path.

Parameters:

path (typing.Union[str, pathlib.Path]) – Either a path to a CLDF dataset’s metadata file or to a CLDF Examples component as CSV file. Note that in the latter case, the file must use the default column names, as defined in the CLDF ontology.

Return type:

pyigt.igt.Corpus

write_concordance(ctype, filename=None)[source]
Parameters:

ctype (str) – lexicon or grammar or form.

write_concepts(ctype, filename=None)[source]
Parameters:

ctypelexicon or grammar.

get_wordlist(doculect='base', profile=False, ref='crossid', lexstat=True, threshold=0.4)[source]

Return a classical wordlist from the data.

get_profile(clts=None, filename=None)[source]

Compute an orthography profile with LingPy’s function.

Parameters:

filename – Write the computed profile to a file in addition to returning it.

Return type:

segments.profile.Profile

Returns:

segments.Profile instance.