Image from Google Jackets

Why Transform Y? A Critical Assessment of Dependent-Variable Transformations in Regression Models for Skewed and Sometimes-Zero Outcomes / John Mullahy, Edward C. Norton.

By: Contributor(s): Material type: TextTextSeries: Working Paper Series (National Bureau of Economic Research) ; no. w30735.Publication details: Cambridge, Mass. National Bureau of Economic Research 2022.Description: 1 online resource: illustrations (black and white)Subject(s): Other classification:
  • C18
  • C20
  • I10
Online resources: Available additional physical forms:
  • Hardcopy version available to institutional subscribers
Abstract: Dependent variables that are non-negative, follow right-skewed distributions, and have large probability mass at zero arise often in empirical economics. Two classes of models that transform the dependent variable y -- the natural logarithm of y plus a constant and the inverse hyperbolic sine -- have been widely used in empirical work. We show that these two classes of models share several features that raise concerns about their application. The concerns are particularly prominent when dependent variables are frequently observed at zero, which in many instances is the main motivation for using them in the first place. The crux of the concern is that these models have an extra parameter that is generally not determined by theory but whose values have enormous consequences for point estimates. As these parameters go to extreme values estimated marginal effects on outcomes' natural scales approach those of either an untransformed linear regression or a normed linear probability model. Across a wide variety of simulated data, two-part models yield correct marginal effects, as do OLS on the untransformed y and Poisson regression. If researchers care about estimating marginal effects, we recommend using these simpler models that do not rely on transformations.
Tags from this library: No tags from this library for this title. Log in to add tags.
Star ratings
    Average rating: 0.0 (0 votes)
Holdings
Item type Home library Collection Call number Status Date due Barcode Item holds
Working Paper Biblioteca Digital Colección NBER nber w30735 (Browse shelf(Opens below)) Not For Loan
Total holds: 0

December 2022.

Dependent variables that are non-negative, follow right-skewed distributions, and have large probability mass at zero arise often in empirical economics. Two classes of models that transform the dependent variable y -- the natural logarithm of y plus a constant and the inverse hyperbolic sine -- have been widely used in empirical work. We show that these two classes of models share several features that raise concerns about their application. The concerns are particularly prominent when dependent variables are frequently observed at zero, which in many instances is the main motivation for using them in the first place. The crux of the concern is that these models have an extra parameter that is generally not determined by theory but whose values have enormous consequences for point estimates. As these parameters go to extreme values estimated marginal effects on outcomes' natural scales approach those of either an untransformed linear regression or a normed linear probability model. Across a wide variety of simulated data, two-part models yield correct marginal effects, as do OLS on the untransformed y and Poisson regression. If researchers care about estimating marginal effects, we recommend using these simpler models that do not rely on transformations.

Hardcopy version available to institutional subscribers

System requirements: Adobe [Acrobat] Reader required for PDF files.

Mode of access: World Wide Web.

Print version record

There are no comments on this title.

to post a comment.

Powered by Koha