Repository logo
  • Log In
    or
Goethe UniversityGUDe
  • Communities
  • Research Data
  • Organisations
  • Projects
  • People
  • Documentation
  • Log In
    or
  1. Home
  2. Goethe University Frankfurt
  3. Central Facilities
  4. University Library
  5. University Library: Research Data
  6. Word Sense Change Testset
 
  • Details
  • Files
Options
Title(s)
TitleLanguage
Word Sense Change Testset
en
 
Author(s)
NameORCIDGNDAffiliation
Tahmasebi, Nina
0000-0003-1688-1845
1046251600
University of Gothenburg 
Risse, Thomas orcid-logo
0000-0001-6248-1709
University Library 
 
Project(s)
Alexandria 
SoBigData 
Towards a knowledge-based culturomics 
 
Date Issued
21 July 2017
 
Publisher(s)
Goethe-Universität Frankfurt
 
Handle
https://gude.uni-frankfurt.de/handle/gude/250
 
DOI
10.5281/zenodo.495572
 

Type(s) of data
Dataset
 
Language(s)
en
 
Subject Keyword(s)
  • language

  • evolution

  • word sense

 
Abstract(s)
AbstractLanguage
This testset consists of 23 terms which have experienced word sense change during the past centuries. The main changes for each term were found using Wikipedia, dictionary.com and the Oxford English Dictionary. We consider major changes in usage as well as changes to sense. In cases where multiple (fine-grained) senses were available, we opted to accept the widest sense. E.g. for the term rock we consider a music sense without any distinction between different types of rock music, because our dataset is unlikely to have fine-grained sense differentiations. If a clear time point cannot be pinpointed, we choose the earliest possible. For comparison purposes we also chose a set of 11 terms that have experienced minimal change during the investigated period, i.e., stable terms.
en
 
Description(s)
DescriptionLanguage
Supplementary material

1. testset.txt

Contains a list of all terms and the different change types for each term with a short description of the sense and change.


2. Files of the kind "TERM.txt"
The header tells us the term, which clustering coefficient was used, which similarity threshold and which similarity measure.

A path starts with "Path:".

A unit starts with "UNIT:"

and the numbers following indicate 1. the number of years that the unit spans, and then a list of all years that the internal clusters stem from.

E.g., UNIT: 83 1785, 1787, 1790, 1793, 1798, 1801, 1823, 1867, spanns 83 years and consists of clusters from year 1785, 1787, 1790 etc.

Indentation shows the tree structure, more indentation means lower level branch in the tree.

As an example, in AEROPLANE.txt unit UNIT: 23 1908, 1909, 1910, 1911, 1914, 1918, 1930, 1908 is the root node and the unit is related to UNIT: 27 1916, 1919, 1924, 1932, 1942, 1916.

Interesting findings

The longest units and paths are found for stable terms, e.g., newspaper. These are statistically significantly longer than the average units and paths for terms that later evolve.

Newspaper has a unit that spans 145 years and the first path spans from 1852 - 2007.


FLIGHT.txt

For the term flight we find that the first unit captures a name, Flight & Robson who were organ builders.

The second unit (it its own path) represents the flight over a hurdle: UNIT: 28 1868, 1869, 1870, 1877, 1885, 1889, 1890, 1892, 1893, 1894, 1895

There is a unit (it its own path) that represents the flight of a cricket ball: UNIT: 29 1938, 1957, 1966

Finally, the last path represents flight as in a means of transportation, in particular for holidays, starting with UNIT: 19 1962, 1970, 1973, 1980


TAPE.txt

The first path for tape is a path related to sowing tape.

Then there is a second path starting with UNIT: 38 1970, 1974, 2007 that takes up the musical tape.

The last path end in the same units that the second path ends in, also related to the musical tape.

The music tape and the sowing tape should be related because of their shape, but we cannot find any relation as there are few or no overlapping terms.
en
 

Funder(s)
NameType of identifierFunder identifierAward numberAward titleAward URI
European Research Council
ERC 339233
Alexandria
European Community H2020 Program
RIA 654024
SoBigData
Swedish Research Council
dnr 2012-5738
Towards a knowledge-based culturomics
 

License
Creative Commons Attribution 4.0 International (CC BY 4.0) cclicense-logocclicense-logo
 

Views
29
Last Week
1
Last Month
1
Acquisition Date
May 9, 2025
View Details
Downloads
5
Acquisition Date
May 9, 2025
View Details

Orcid
DSpace-CRIS
Datacite
Legal Terms
  • Terms of Use
  • Publication Contract
  • Legal Notice
Privacy
  • Privacy Information
  • Cookie Settings
Help & Information
  • User Documentation
  • Contact Us
Resources for Developers
  • API Explorer (HAL Browser)
  • API REST Contract
  • API Python Client