Signals in stylometry: What numbers tell us about literary works

Jan Rybicki
Jagiellonian University

Believe it or not, a computer can tell authors apart by counting the frequencies of some of the most frequent words they use. But the very same authorship attribution methods, based on multivariate nearest-neighbor analysis of vocabulary statistics, can also group authors by chronology, genre, or gender. While the exact mechanism behind this phenomenon remains unknown, it is worthwhile to observe how it persists in a variety of literary text collections in several languages.