pyreadr.custom_errors.LibrdataError: Unable to read from file for large RDS files
Is there an upper limit on the size of RDS files that can be loaded using pyreadr? When reading an RDS file of a small matrix, the code works well, but when reading large matrices (>10GB in size), I get the following error: pyreadr.custom_errors.LibrdataError: Unable to read from file
I think there should not be such a limit. In addition you should probably get a memory error instead of a unable to read from file error, so I suspect that there is something else happening with that file. IS the file something you have created yourself with R? or is it something somebody else generated? If somebody else I think as mentioned before the problem is something else besides the size. If you did create it, please share a simplified code to reproduce the issue.
This is a matrix that I generated from my data. It is a 31595 by 39643 matrix saved using saveRDS(my.mtx, file = "expr.rds") command. When I subset to fewer rows, e.g., saveRDS(my.mtx[1:5000,], file = "expr.rds"), pyreadr works without any issue.
My pyreadr version is 0.4.9
I can confirm this bug exists. I submitted a fix to librdata (https://github.com/WizardMac/librdata/pull/49), please consider updating once it is merged.
Sure, I will update here once the PR is merged into librdata