In a 2008 paper, Wilfred Major constructs what he calls the 50% and 80% vocab lists for Classical Greek. That is, the lemmata that account for 50% and 80% respectively of tokens in the Classical Greek corpus. In this post I provide the code for the equivalent for the Greek New Testament and talk about some of the results.

Major’s paper is It’s Not the Size, It’s the Frequency: The Value of Using a Core Vocabulary in Beginning and Intermediate Greek and as well as listing the 65 words in the “50% List” he lists the roughly 1,100 words in the “80% List” complete with glosses in both cases.

Major also discusses other issues near and dear to this blog such as the relevance of form frequency as well as lemma frequency. I’ll respond to him on some of these topics in later blog posts.

Now, for many years I’ve talked about the limitations of a purely frequency-based approach to vocab ordering but that doesn’t mean producing such lists is useless, just that there are things we can do to improve on that approach. But I still thought it would be interesting to produce GNT 50% and 80% lists.

The code is available here.

The 50% list consists of just 27 lemmata. The only verbs are γίνομαι, εἰμί, ἔχω, and λέγω. The only nouns are θεός, κύριος, and Ἰησοῦς.

The 80% list consists of 317 lemmata.

As expected, this is considerably smaller than Major’s Classical Greek lists which are based on a considerably larger corpus.

It’s easy to tweak the code to look at forms rather than lemmata. The 50% forms list for the GNT consists of 97 forms from 52 lemmata.

Interestingly, those 97 forms consist of 16 forms of the article, 15 forms of the (1st/2nd person) personal pronouns, and 6 forms of αὐτός. This suggests that even without arguments on morphological grounds, it’s worth learning the full paradigms for the article, the personal pronouns and αὐτός really early on.

Unsurprisingly, λέγω gets a decent showing with 4 forms: εἶπεν, λέγει, λέγω and λέγων. I’ve long though it’s worth learning those right away without needing to introduce full paradigms.

There’s a lot more that could be explored even with this frequency-based approach. And lots more to say based on the other things Major talks about in his paper.

Finally, it should be stressed that very few full verses of the GNT would be readable with just the 80% list and probably none with the 50% list. I may do another post later on to confirm that.

UPDATE: Now see Actual Core Vocab Lists for Greek New Testament