Collections of IGT
- class pyigt.Corpus(igts, fname=None, clean_lexical_concept=None)[source]
A Corpus is an immutable, ordered list of IGT instances.
It provides access to concordance-like aggregated statistics of its texts.
- Variables:
monolingual – Flag signaling whether the corpus is monolingual or contains IGT from different object languages.
- Parameters:
igts (
collections.abc.Iterable[pyigt.igt.IGT]) –
- property grammar: dict[str, list[pyigt.igt.MorphemeReference]]
Maps grammatical concepts to lists of occurrences.
>>> from pyigt import Corpus, IGT >>> igt = IGT(phrase="ni-c-chihui-lia in no-piltzin ce calli", ... gloss="1SG.SUBJ-3SG.OBJ-mach-APPL DET 1SG.POSS-Sohn ein Haus") >>> c = Corpus([igt]) >>> [[c[ref] for ref in c.grammar[k]] for k in c.grammar if k.startswith('1SG')] [[<GlossedMorpheme morpheme=ni gloss=1SG.SUBJ>], [<GlossedMorpheme morpheme=no gloss=1SG.POSS>]]
- property lexicon: dict[str, list[pyigt.igt.MorphemeReference]]
Maps lexical concepts to lists of occurrences.
>>> from pyigt import Corpus, IGT >>> igt = IGT(phrase="ni-c-chihui-lia in no-piltzin ce calli", ... gloss="1SG.SUBJ-3SG.OBJ-mach-APPL DET 1SG.POSS-Sohn ein Haus") >>> c = Corpus([igt]) >>> [c[ref] for ref in c.lexicon['Sohn']] [<GlossedMorpheme morpheme=piltzin gloss=Sohn>]
- property form: dict[str, list[pyigt.igt.MorphemeReference]]
Maps grammatical concepts to lists of occurrences.
>>> from pyigt import Corpus, IGT >>> igt = IGT(phrase="ni-c-chihui-lia in no-piltzin ce calli", ... gloss="1SG.SUBJ-3SG.OBJ-mach-APPL DET 1SG.POSS-Sohn ein Haus") >>> c = Corpus([igt]) >>> [k for k in c.form] ['ni', 'c', 'chihui', 'lia', 'in', 'no', 'piltzin', 'ce', 'calli']
- classmethod from_cldf(cldf)[source]
Instantiate a corpus of IGT examples from a CLDF dataset.
- Parameters:
cldf (
pycldf.dataset.Dataset) – a pycldf.Dataset instance.spec – a CorpusSpec instance, specifying how to interpret markup in the corpus.
- Return type:
- classmethod from_path(path)[source]
Instantiate a corpus from a file path.
- Parameters:
path (
typing.Union[str,pathlib.Path]) – Either a path to a CLDF dataset’s metadata file or to a CLDF Examples component as CSV file. Note that in the latter case, the file must use the default column names, as defined in the CLDF ontology.- Return type:
- write_concordance(ctype, filename=None)[source]
- Parameters:
ctype (
typing.Literal['grammar','lexicon','form']) – lexicon or grammar or form.filename (
typing.Union[str,pathlib.Path,None]) –
- write_concepts(ctype, filename=None)[source]
- Parameters:
ctype (
typing.Literal['grammar','lexicon','form']) – lexicon or grammar.filename (
typing.Union[str,pathlib.Path,None]) –
- get_wordlist(doculect='base', profile=None, lingpy_settings=LingPySettings(ref='crossid', lexstat=True, threshold=0.4))[source]
Return a classical wordlist from the data.
- Parameters:
doculect (
str) –profile (
typing.Union[pathlib.Path,str,segments.profile.Profile,None]) –lingpy_settings (
pyigt.igt.LingPySettings) –