Written by Alice Ryder
Being asked to read a few pages for homework may sound like a chore, but imagine reading more than five million books. That’s what Slovenian physicist Matjaz Perc did to investigate how the use of words in the English language has changed over time.
Luckily, Matjaz had the help of computer programs to analyse 5.2 million books – around four per cent of all books written in English between 1520 and 2008. He found that 400 years ago, the most popular words and phrases changed quite rapidly. But, over the past two centuries, our most popular words have remained fairly stable.
Matjaz found a mathematical idea called Zipf’s law could be used to describe how often words appeared in books. This law means a word’s frequency is inversely proportional to its ranking in popularity. In other words, the most popular word appears twice as often as the second most popular word, three times as often as the third most popular word, and so on.
This means that although English has a large vocabulary, the vast majority of what we write is made up of a relatively small number of words.
You can use part of the same data set of words that Matjaz used to track the use of words or phrases – from your own name to ‘Harry Potter’ – on Google’s Ngram viewer at books.google.com/ngrams.
What is the top-ranking word since the 1500s? Of all the words across all the texts that Matjaz analysed, one word made up eight per cent of them. ‘The’.
If you’re after more science news for kids, subscribe to Double Helix magazine!
Leave a Reply