Linking Individuals Across Historical Sources: a Fully Automated Approach / Ran Abramitzky, Roy Mill, Santiago Pérez.

By:

Abramitzky, Ran

Contributor(s):

Material type: Text

TextSeries: Working Paper Series (National Bureau of Economic Research) ; no. w24324.Publication details: Cambridge, Mass. National Bureau of Economic Research 2018.Description: 1 online resource: illustrations (black and white)Subject(s):

Online resources:

Available additional physical forms:

Hardcopy version available to institutional subscribers

Abstract: Linking individuals across historical datasets relies on information such as name and age that is both non-unique and prone to enumeration and transcription errors. These errors make it impossible to find the correct match with certainty. In the first part of the paper, we suggest a fully automated probabilistic method for linking historical datasets that enables researchers to create samples at the frontier of minimizing type I (false positives) and type II (false negatives) errors. The first step guides researchers in the choice of which variables to use for linking. The second step uses the Expectation-Maximization (EM) algorithm, a standard tool in statistics, to compute the probability that each two records correspond to the same individual. The third step suggests how to use these estimated probabilities to choose which records to use in the analysis. In the second part of the paper, we apply the method to link historical population censuses in the US and Norway, and use these samples to estimate measures of intergenerational occupational mobility. The estimates using our method are remarkably similar to the ones using IPUMS', which relies on hand linking to create a training sample. We created an R code and a Stata command that implement this method.

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

Holdings
Item type	Home library	Collection	Call number	Status	Date due	Barcode	Item holds
Working Paper	Biblioteca Digital	Colección NBER	nber w24324 (Browse shelf(Opens below))	Not For Loan

Total holds: 0

Collection: Colección NBER Close shelf browser (Hides shelf browser)

Previous	No cover image available	No cover image available	No cover image available	No cover image available	No cover image available	No cover image available	No cover image available	Next
Previous	nber w24321 Modeling Automation /	nber w24322 Finance and Business Cycles: The Credit-Driven Household Demand Channel /	nber w24323 Political Alignment, Attitudes Toward Government and Tax Evasion /	nber w24324 Linking Individuals Across Historical Sources: a Fully Automated Approach /	nber w24325 Uncertainty and Economic Activity: A Multi-Country Perspective /	nber w24326 Nonlinear Household Earnings Dynamics, Self-insurance, and Welfare /	nber w24327 Intuitive Donating: Testing One-Line Solicitations for $1 Donations in a Large Online Experiment /	Next

February 2018.

Linking individuals across historical datasets relies on information such as name and age that is both non-unique and prone to enumeration and transcription errors. These errors make it impossible to find the correct match with certainty. In the first part of the paper, we suggest a fully automated probabilistic method for linking historical datasets that enables researchers to create samples at the frontier of minimizing type I (false positives) and type II (false negatives) errors. The first step guides researchers in the choice of which variables to use for linking. The second step uses the Expectation-Maximization (EM) algorithm, a standard tool in statistics, to compute the probability that each two records correspond to the same individual. The third step suggests how to use these estimated probabilities to choose which records to use in the analysis. In the second part of the paper, we apply the method to link historical population censuses in the US and Norway, and use these samples to estimate measures of intergenerational occupational mobility. The estimates using our method are remarkably similar to the ones using IPUMS', which relies on hand linking to create a training sample. We created an R code and a Stata command that implement this method.