haven icon indicating copy to clipboard operation
haven copied to clipboard

Error in read_sas using catalog file

Open ValValetl opened this issue 2 years ago • 10 comments

Hi, I am getting the error message "Error: Failed to parse formats.sas7bcat: Invalid file, or file has unsupported features. " when importing SAS data with a catalog file. This is the same error as in the closed issue #34. The data import without catalog file works.

I am using the latest haven version (2.5.0) and tested it with the development version on github.

fpath <- "path/to/sas/data/file"
catalog  <- "formats.sas7bcat"
sas_data <- haven::read_sas(fpath, catalog_file = catalog)

Error: Failed to parse catlog.sas7bcat: Invalid file, or file has unsupported features. 

ValValetl avatar May 06 '22 06:05 ValValetl

Hi @ValValetl, thanks for the bug report.

Can you please share the catalog file and also some example data if possible? Without the catalog file it's not possible to track down the error.

gorcha avatar May 06 '22 06:05 gorcha

Hi @gorcha Unfortunately, this is not possible at it is non-public data. I thought the issue report might still be of interested as issue #34 was closed a while ago, without any resolution of the issue.

ValValetl avatar May 06 '22 08:05 ValValetl

Even if not the data, are you able to share the catalog file?

gorcha avatar May 06 '22 08:05 gorcha

I need to check with the owner. I will get back to you later. Thanks for your quick responses!

ValValetl avatar May 06 '22 08:05 ValValetl

Sorry for the long delay. Here is the catalog file that produces the error message: sas_catalog_file.zip

ValValetl avatar Jul 07 '22 09:07 ValValetl

No worries at all, thanks!

gorcha avatar Jul 14 '22 09:07 gorcha

Was this ever diagnosed? I'm running into the same issue.

joshuaborn avatar Aug 17 '22 18:08 joshuaborn

Hi @joshuaborn, I haven't had a chance to look at this yet unfortunately but hopefully will over the next few weeks.

There's no guarantee that this is the same issue affecting you. Would you be able to provide an example file that I can test by any chance?

gorcha avatar Sep 01 '22 02:09 gorcha

Hi, @gorcha . The particular file I first encountered the issue with was a restricted use file, but I've seen it with at least one other data set since then. I should have some time this weekend to try it out with public use data files, and if I can replicate it, I'll share.

joshuaborn avatar Sep 01 '22 16:09 joshuaborn

Thanks @joshuaborn, much appreciated!

gorcha avatar Sep 02 '22 00:09 gorcha

NSFG_example.zip

I neglected to follow-up on this back in September, but I was using Haven today and found a good example of this issue with public use data. Attached are four files from the National Survey of Family Growth 2017-2019 public use data. The d2017_2019femresp.sas7bdat and d2017_2019femresp.sas7bcat pair load using read_sas just fine, but trying to use read_sas with the d2017_2019fempreg.sas7bdat and d2017_2019fempreg.sas7bcat pair leads to an error message of the form

Error: Failed to parse .../d2017_2019fempreg.sas7bcat: Invalid file, or file has unsupported features.

Using read_sas on just d2017_2019fempreg.sas7bdat without the catalog file works.

I'm using R version 4.2.2 on Windows 11 with Haven version 2.5.1.

The interesting thing about this example is that the pregnancy data table (d2017_2019fempreg) is ultimately derived from the female respondents table (d2017_2019femresp). I tried examining the two catalog files in SAS using PROC CATALOG, but didn't see anything obvious in one, but not the other.

As an aside, since these parse errors seem to happen with catalog files more than with regular SAS data files, maybe it would be worth adding to Haven the ability to side-load value labels from a sas7bdat file or even a CSV file? It seems pretty straightforward to load another table and call labelled as needed, and SAS can export its value labels to a regular data table easily with PROC CONTENTS, etc. I would be willing to work on this, since it would save me time in the long run.

joshuaborn avatar Feb 20 '23 01:02 joshuaborn

Hi @joshuaborn, thanks for the extra example file - there have been a few recent updates in the dev version of ReadStat for catalog file reading that might resolve these issues, I'll check it out.

I suspect this is a little different to the initial problem in this issue (which was specifically a problem with Unix 64 bit file formats), but there are some other bugs that have been fixed that might affect this one.

gorcha avatar Feb 20 '23 02:02 gorcha

Hi @joshuaborn, can confirm that the recent ReadStat changes have fixed the issue with this file. They've just released an update over there so these should be in haven shortly!

gorcha avatar Feb 20 '23 23:02 gorcha

Hi, @gorcha. Thanks for confirming that! And my apologies for resurrecting the wrong issue thread.

joshuaborn avatar Feb 21 '23 00:02 joshuaborn

No worries at all!

gorcha avatar Feb 21 '23 01:02 gorcha