Blog

Book with letters falling out.
The frequency of words written in the English language can be described using a formula called Zipf’s law.
Image: Thinkstock

Written by Alice Ryder

Being asked to read a few pages for homework may sound like a chore, but imagine reading more than five million books. That’s what Slovenian physicist Matjaz Perc did to investigate how the use of words in the English language has changed over time.

Luckily, Matjaz had the help of computer programs to analyse 5.2 million books – around four per cent of all books written in English between 1520 and 2008. He found that 400 years ago, the most popular words and phrases changed quite rapidly. But, over the past two centuries, our most popular words have remained fairly stable.

Matjaz found a mathematical idea called Zipf’s law could be used to describe how often words appeared in books. This law means a word’s frequency is inversely proportional to its ranking in popularity. In other words, the most popular word appears twice as often as the second most popular word, three times as often as the third most popular word, and so on.

This means that although English has a large vocabulary, the vast majority of what we write is made up of a relatively small number of words.

You can use part of the same data set of words that Matjaz used to track the use of words or phrases – from your own name to ‘Harry Potter’ – on Google’s Ngram viewer at books.google.com/ngrams.

What is the top-ranking word since the 1500s? Of all the words across all the texts that Matjaz analysed, one word made up eight per cent of them. ‘The’.

If you’re after more science news for kids, subscribe to Double Helix magazine!

Subscribe now! button

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

By posting a comment you are agreeing to the Double Helix commenting guidelines.

Why choose the Double Helix magazine for your students?

Perfect for ages 8 – 14

Developed by experienced editors

Engaging and motivating

*84% of readers are more interested in science

Engaging students voice