In a previous post, we looked at which chapters had the highest mean log frequency of lexemes. The code provided there was applicable to other items, though, so let’s now take a look at mean log frequency of forms.

The code change is a simple change to one line.

The top 10 are:

6277 2304 449
6373 2305 429
6500 2302 585
6558 0403 657
6562 2303 467
6596 1001 401
6600 0408 905
6617 2301 207
6640 0702 287
6646 2720 406

In other words:

  • 1 John 4 (also 1st for lexemes)
  • 1 John 5 (also 2nd for lexemes)
  • 1 John 2 (8th for lexemes)
  • John 3 (9th for lexemes)
  • 1 John 3 (7th for lexemes)
  • Ephesians 1 (11th for lexemes)
  • John 8 (6th for lexemes)
  • 1 John 1 (4th for lexemes)
  • 1 Corinthians 2 (32nd for lexemes)
  • Revelation 20 (14th for lexemes)

Generally form frequency will track pretty closely with lexeme frequency because a form being common makes the lexeme common. This makes 1 Corithinians 2 interesting.

Frequent words and forms obviously doesn’t necessarily mean shallow syntax, though. 1 John 4, 5 and 2 are respectively the 36th 67th and 38th by mean dependency depth. There are no chapters that are in the top ten of both mean log form frequency AND mean dependency depth.

So we now have mean log frequences for lexemes and forms as well as mean dependency depth. In future posts, I’ll add parse codes and the actual dependency path to the mix and then we can look at combining all five metrics. I’ll also look at paragraphs rather than chapters as targets.