pyreadstat.read_sav won't work on databricks
Please read the README, particularly the known limitations section!
When I try this:
df, meta = pyreadstat.read_sav("/Volumes/sandbox/schema/folder/myfile.sav")
I get "Unknown Error".
This happens for versions 1.3.1, 1.3.0, 1.2.9, and 1.2.8 but works fine in 1.2.7.
To Reproduce databricks (serverless) environment version 4
File example Tried with many SAV files; any file will reproduce.
Expected behavior File loads
Setup Information: How did you install pyreadstat?
- tested with
!pip install pyreadstat==1.3.1in notebook or using the databricks environment GUI (version 4).
Platform: linux Python Version: Python 3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0] on linux
Very strange. If I understand correctly those files would open fine in a local machine but fail in databricks?
If so, unfortunately I cannot reproduce as I don't have databricks.
The error suggests the problem is coming from Readstat, and indeed in version 1.2.8 there was an update on the Readstat sources, so it may be a problem over there, but cannot tell without being able to reproduce. You may want to report also in Readstat in case somebody over there could reproduce.
BTW, have you tried to copy one of those files from /Volumes to your home directory? Maybe it has something to do with the way the Volume is mounted.
I don't think it is related to /Volumes or file locations - as I tried it both in my workspace and in unity catalog. We're freezing at version 1.2.7 (and have been for a while now) until this is resolved.