Academia.eduAcademia.edu
Linking Lexical Resources for Biblical Greek J. K. Tauber jktauber.com Lemma • canonical form • dictionary form • citation form • headword Lexeme • lexical entry • lexical item Lemma vs Lexeme “lemma refers to the particular form that is chosen by convention to represent the lexeme” lemma used at both ends loosely coupled, linking the “not yet known” different from scholia / lemmatic commentary usage annotated word in text lemma a type of key entry in lexical resource Natural Keys vs Surrogate Keys is the unique identifier derived from properties of the object or not? Strong’s Numbers as Surrogate Keys • disambiguate homographs μήν ~ μήν • isolate from headword choices 1st person singular ~ infinitive • usable without language knowledge Problems with Strong’s • gloss errors • corpus scope limitations • lumping / splitting disagreements Just Expanding Corpus Scope • just add them at end or renumber? Goodrick and Kohlenberger 1990 • doesn’t solve lumping/splitting problem Strong and G/K disagree BDAG and Danker’s Concise Lexicon disagree LSJ and Middle Liddell disagree • the sin of number reuse / conflation Number Reuse Sins Titus 2.7 textual variants ἀδιαφθορία 90 91 ἀφθονία ἀφθορία 916 917 Any text with ἀφθονία or ἀφθορία cannot reference Strong’s numbers unless “90” is reused to mean “any of ἀδιαφθορία, ἀφθονία, or ἀφθορία” which conflates it with “90” meaning just ἀδιαφθορία! Possible Solution Perhaps the simplest solution is that we distinguish in an entry in lexemes.yaml between: • this is clearly the Strongs that should be used for this word (identical word or minor spelling / capitalisation / diacritic difference) • this is a Strongs number that people have shoe-horned this word into because there isn't actually a good Strongs number for it and possible include in a separate field the "canonical Strongs" for the number given. For example, perhaps in the entry for ἀγέλω, we don't say that it has a strongs of 518 but rather it has an alt-strongs of 518 and the canonical lemma for 518 is actually ἀπαγέλω. https://github.com/jtauber/greek-lemma-mappings/wiki/Conflating-Strongs In Praise of Universal Identifiers 934 strong:934 urn:…:strong:934 Lumping / Splitting I. III. II. IV. βασίλειος ‘royal’ ~ βασίλειον ‘palace’ same word or two different words? βασίλειος ‘royal’ ~ βασίλειον ‘palace’ same word or two different words? whole class of “cross-over adjectives” Strong distinguishes (934 vs 933) G/K does not (994 for both) μέχρι ~ μέχρις ἄχρι ~ ἄχρις same word or two different words? μέχρι ~ μέχρις ἄχρι ~ ἄχρις same word or two different words? Strong conflates μέχρι(ς) G/K splits μέχρι and μέχρις G/K conflates ἄχρι(ς) Σολομών ~ Σολομῶν same word or two different words? Σολομών ~ Σολομῶν same word or two different words? differ only in accentuation in nominative but genitives differ because different inflectional classes, different stems ἀνάπειρος ~ ἀνάπηρος same word or two different words? ἀνάπειρος ~ ἀνάπηρος same word or two different words? you might not care if you’re doing lexical semantics but you if you’re a textual critic or phonologist you might also, why should ἀναπείρους be lemmatised ἀνάπηρος? δῶ ~ δῶ ~ δεῖ ‘tie’ ‘need’ impers. one, two, or three different words? ξυράομαι ~ ξυράω ~ ξυρέω ~ ξύρω one, two, three, or four different words? ἀναλόω ~ ἀναλίσκω ἀποκτείνω ~ ἀποκτέννω ἑλκύω ~ ἕλκω ἵστημι ~ ἱ στάνω same word or two different words? None of these competing lump / split analyses are WRONG per se And if we’re integrating existing lexical resources, we have to live with these differences Across multiple lexical resources and multiple annotated texts, we find HUNDREDS of these discrepancies Morwood Principal Parts ἀλάσσω ~ ἀλάττω ἁρμόττω ~ ἁρμόζω κλαίω ~ κλάω αὐξάνω ~ αὔξω μείγνῡμι ~ μί γνῡμι οἶμαι ~ οἴομαι this is just within Morwood’s list much less issues integrating his list with others (Pratt, DCC, etc) Desiderata • allow alternative lump / split analyses to be represented simultaneously • be able to integrate with legacy lexical resources AND the annotated texts that already link to those legacy resources • be able to integrate future resources and texts without breaking links to previous resources A New Numbering System for Greek New Testament Lexemes Tauber and Sandborg-Petersen (2006) Numbering assuming prefixing for universal identifiers 326 278 A B Joining 591 = {326, 278} 591 326 278 A B Now we have a new number we can use whenever a resource conflates or fails to distinguish A and B or if A and B can’t be or have not been disambiguated in a text Splitting 591 = {326, 278} 591 278 = {592, 593} 326 278 592 593 B1 B2 A Note that the splitting of B into separate B1 and B2 doesn’t break any references because all external references used 278 Multiple Parents 594 591 = {326, 278} 591 278 = {592, 593} 326 278 594 = {326, 592} 592 593 B1 B2 A Lumping / Splitting I. III. II. IV. δῶ ~ δῶ ~ δεῖ δέω δεῖ impers. δῶ ~ δῶ ~ δεῖ 1 2 δέω δεῖ impers. δῶ ~ δῶ ~ δεῖ 1 3 2 4 δεῖ impers. δέω ‘tie’ δέω ‘need’ δῶ ~ δῶ ~ δεῖ 5 1 3 2 4 δεῖ impers. δέω ‘tie’ δέω ‘need’ Below the Lexeme • Strong splits εἷς and μία • Strong splits individual cases of some pronouns • λέγω ~ εἶπον (distinguished by Strong but not G/K) • ὁράω ~ εἶδον (distinguished by G/K but not Strong) • φαγω ~ ἐσθίω • even κλείς acc. κλειδα ~ κλειν (stem κλειδ ~ κλεj) • gives us hooks to hang morphological information inflectional classes, stems, etc • also relates to an adjacent project building a principal parts database Pratt, Morwood, DCC • word sense disambiguation, named entity resolution Progress and Plans • BDAG • Danker’s Concise Lexicon • Dodson’s lexicon • old list from Bill Mounce • Strong’s numbers • G/K numbers • MorphGNT SBLGNT lemmatisation vs Nestle 1904 lemmatisation • Abbott-Smith • updated Mounce dictionary • LSJ • Lemmatisations from Dik and Celano • Brill GE Collaborations • biblicalhumanities.org • Perseus Digital Library • Logeion • you? • W3C Ontology-Lexica Community Group • Linked Linguistic Data community Links • https://github.com/jtauber/greek-lemma-mappings • https://github.com/morphgnt/morphological-lexicon especially /projects/lemmatization_differences • https://jktauber.com/2016/06/17/modelling-stems-and-principal-part-lists/ Linking Lexical Resources for Biblical Greek J. K. Tauber jktauber.com