Word Sense Change Testset
21 July 2017
Type(s) of data
This testset consists of 23 terms which have experienced word sense change during the past centuries. The main changes for each term were found using Wikipedia, dictionary.com and the Oxford English Dictionary. We consider major changes in usage as well as changes to sense. In cases where multiple (fine-grained) senses were available, we opted to accept the widest sense. E.g. for the term rock we consider a music sense without any distinction between different types of rock music, because our dataset is unlikely to have fine-grained sense differentiations. If a clear time point cannot be pinpointed, we choose the earliest possible. For comparison purposes we also chose a set of 11 terms that have experienced minimal change during the investigated period, i.e., stable terms.
Contains a list of all terms and the different change types for each term with a short description of the sense and change.
2. Files of the kind "TERM.txt"
The header tells us the term, which clustering coefficient was used, which similarity threshold and which similarity measure.
A path starts with "Path:".
A unit starts with "UNIT:"
and the numbers following indicate 1. the number of years that the unit spans, and then a list of all years that the internal clusters stem from.
E.g., UNIT: 83 1785, 1787, 1790, 1793, 1798, 1801, 1823, 1867, spanns 83 years and consists of clusters from year 1785, 1787, 1790 etc.
Indentation shows the tree structure, more indentation means lower level branch in the tree.
As an example, in AEROPLANE.txt unit UNIT: 23 1908, 1909, 1910, 1911, 1914, 1918, 1930, 1908 is the root node and the unit is related to UNIT: 27 1916, 1919, 1924, 1932, 1942, 1916.
The longest units and paths are found for stable terms, e.g., newspaper. These are statistically significantly longer than the average units and paths for terms that later evolve.
Newspaper has a unit that spans 145 years and the first path spans from 1852 - 2007.
For the term flight we find that the first unit captures a name, Flight & Robson who were organ builders.
The second unit (it its own path) represents the flight over a hurdle: UNIT: 28 1868, 1869, 1870, 1877, 1885, 1889, 1890, 1892, 1893, 1894, 1895
There is a unit (it its own path) that represents the flight of a cricket ball: UNIT: 29 1938, 1957, 1966
Finally, the last path represents flight as in a means of transportation, in particular for holidays, starting with UNIT: 19 1962, 1970, 1973, 1980
The first path for tape is a path related to sowing tape.
Then there is a second path starting with UNIT: 38 1970, 1974, 2007 that takes up the musical tape.
The last path end in the same units that the second path ends in, also related to the musical tape.
The music tape and the sowing tape should be related because of their shape, but we cannot find any relation as there are few or no overlapping terms.
|Name||Type of identifier||Funder identifier||Award number||Award title||Award URI|
European Research Council
European Community H2020 Program
Swedish Research Council
Towards a knowledge-based culturomics