haven icon indicating copy to clipboard operation
haven copied to clipboard

Breaking change: <haven_labelled> objects no longer support arithmetic — destroys reproducibility and punishes labelled workflows

Open nickharrigan opened this issue 2 months ago • 2 comments

Summary I am reporting a major regression introduced through vctrs and propagated into haven. Code that worked for years with <haven_labelled> vectors now throws hard errors such as:

Error in `vec_arith()`:
! <haven_labelled> - <haven_labelled> is not permitted

This breaks reproducibility for researchers who invested heavily in labelled data workflows. What previously worked seamlessly (statistical tests, arithmetic, plotting) now forces us to strip labels and discard metadata.

What changed

  • In earlier versions of haven + vctrs, labelled vectors behaved as numeric in arithmetic contexts (e.g. t-tests, regression, means). Labels provided metadata, but math worked.
  • As of recent vctrs releases, arithmetic on <haven_labelled> is explicitly forbidden. Instead of coercion, code now halts with errors.
  • This breaks pipelines that previously ran without issue, especially in research where labelling is core to data integrity.

Why this matters

I (and many others) have put huge effort into carefully labelling survey data (hundreds of variables, hundreds of hours).

Labelling is not decoration — it is intellectual work: preserving codebooks, ensuring correct interpretation, protecting against mistakes.

Under the new rules, all that investment becomes a liability. To continue analysis, I must either:

  • Strip all labels (zap_labels()), losing metadata, or
  • Rewrite large amounts of code to wrap every variable in as.numeric().

This is an unacceptable cost and undermines reproducibility. Code that produced results five years ago no longer runs today.

Impact on reproducibility

  • Published pipelines, teaching materials, and collaborative projects now fail.
  • Results cannot be regenerated without altering code and discarding metadata.
  • This undermines trust in R as a stable scientific environment.

Request / Proposal

  • Restore arithmetic support for <haven_labelled> objects when underlying values are numeric.
  • At minimum: allow safe coercion to numeric by default, with a warning if labels are present.
  • This was the historical behaviour and respected both metadata and usability.

If you will not restore this behaviour, provide:

  • A global option (e.g. options(haven.arithmetic = "coerce"))
  • Or a helper (e.g. as_numeric_with_labels()) that strips labels at analysis time but keeps a retrievable dictionary.
  • Communicate clearly in release notes and migration guides that arithmetic on labelled data is now disabled, with explicit recommendations for migration.

Closing This is not a minor breaking change. It directly punishes users who took the time to carefully label data, and forces us to choose between junking labels or junking code. That is hostile to research workflows.

I urge the maintainers to consider how this decision impacts reproducibility, and to provide a path forward that does not discard the enormous investment researchers have made in labelled datasets.

nickharrigan avatar Oct 08 '25 19:10 nickharrigan

I have to say that I am no longer getting the issue with exactly the same code. I don't understand why I would get the error and then not get the error?

nickharrigan avatar Oct 08 '25 19:10 nickharrigan

Hi @nickharrigan, appreciate that this is important to you but this is not a bug we were aware of until now so there has been no opportunity to fix it.

What we really need is a minimal reprex (reproducible example). The goal of a reprex is to make it as easy as possible for me to recreate your problem so that I can fix it - if you've never heard of a reprex before, start by reading about the reprex package, including the advice further down the page. Please make sure your reprex is created with the reprex package as it gives nicely formatted output and avoids a number of common pitfalls.

gorcha avatar Oct 08 '25 23:10 gorcha