Software tools for classifying names.
The Onomap family of software tools is widely used in research for the classification of names. It was originally developed in 2006-7 during two Economic and Social Research Council (ESRC)-funded Knowledge Transfer Partnerships between University College London and parts of the UK National Health Service (NHS). Subsequent development work was carried out under the Engineering and Physical Sciences Research Council (EPSRC) ‘Uncertainty of Identity’ project. Early versions of the software used a UK national database of names. This work has significantly been extended to include a total of 300 million names from 27 other countries over the last fifteen years, and now uses names data for 1.2+ billion
individuals living in almost every country of the world (more than one in seven of the planet’s population). Collaboration with the UK Office for National Statistics has helped in validating and improving classifications. Our software has been widely used in more than 50 projects in public health, equality audits, epidemiology, business, human genetics and social media analysis, as documented on this Excel spreadsheet. Current ESRC ‘Digital Footprints’ funding of the Consumer Data Research Centre (CDRC) enables provision of annual UK small area ethnicity estimates to approved users, as well as nationwide mapping of the residential geographies of different ethnic groups. CDRC has also enabled mapping of the geography of genealogy.
Software and Services
Classifications and Software Tools
We have developed four software tools that are honed for different types of applications. These can be used to classify names lists that fall under public interest exemptions or research derogations from General Data Protection Regulation. We can classify such lists for users or we can provide software for users to conduct such work themselves. The following products have been developed:
1. Ethnicity Estimator
An Ethnicity Estimator that uses a UK dictionary of names to produce aggregated summary estimates of ethnicities. Its 12 categories are widely used in UK official statistics. This service is available free of charge to approved users pursuing public interest research. Application is through the Economic and Social Research Council-funded Consumer Data Research Centre. Approved users can upload lists of names for classification or download software to classify such data themselves.
The original Onomap software uses a UK dictionary of names to produce individual-level estimates of over 150 cultural, ethnic and linguistic groups using a computer algorithm. A consultancy service provides classified lists or a software licence for single-user computers.
A worldwide Onomap2 classification that uses a 1.2+ billion record global names database from almost every country worldwide to produce individual level estimates of most probable countries of origin. This is recommended for applications using data sourced from outside the UK. A consultancy service provides downloadable classified lists or software licenses for single user computers.
A UK-centric Onomap3 classification that supplements the 1.2+ billion global names database with historical census statistics. These refine individual level estimates of most probable countries of origin using known inter-generational settlement histories. This is recommended for applications based on UK data. A consultancy service provides downloadable classified lists or a software licence for single user computers.
Which Solution for You?
Resources and Contact
1. Ethnicity Estimator
2. Onomap, Onomap2, Onomap3 software
3. Onomap, Onomap2 and Onomap3 file coding
Please follow the instructions on the CDRC Ethnicity Estimator webpage.
Please use the webform below to send an email.
(1) specifying the Onomap version you would like to use; and
(2) briefly outlining your requirements in terms of numbers of records to be coded, period of licence required (the default is one year) and number of seats required. Please also indicate the purpose of your application.
We typically provide a username and password for download of the software tailored to your requirements.
We can code files supplied in most formats. Please complete the webform below as set out in (2) above. Direct file uploading may be available for Onomap2 and in other cases, a bespoke service will be provided.
Worldnames2 – worldwide geographies of forenames and surnames, based on 1.2+ billion names held by the Consumer Data Research Centre UK ethnicity mapping – the changing ethnic geography of the UK, aggregated from Consumer Data Research Centre Modelled Ethnicity Proportions.
GBNames – the geography of British genealogy from 1851 to the present day and Consumer Data Research Centre profiling of family names