pyreadr icon indicating copy to clipboard operation
pyreadr copied to clipboard

pyreadr.custom_errors.LibrdataError: Unable to read from file for large RDS files

Open pty0111 opened this issue 2 years ago • 4 comments

Is there an upper limit on the size of RDS files that can be loaded using pyreadr? When reading an RDS file of a small matrix, the code works well, but when reading large matrices (>10GB in size), I get the following error: pyreadr.custom_errors.LibrdataError: Unable to read from file

pty0111 avatar Nov 11 '23 19:11 pty0111

I think there should not be such a limit. In addition you should probably get a memory error instead of a unable to read from file error, so I suspect that there is something else happening with that file. IS the file something you have created yourself with R? or is it something somebody else generated? If somebody else I think as mentioned before the problem is something else besides the size. If you did create it, please share a simplified code to reproduce the issue.

ofajardo avatar Nov 12 '23 08:11 ofajardo

This is a matrix that I generated from my data. It is a 31595 by 39643 matrix saved using saveRDS(my.mtx, file = "expr.rds") command. When I subset to fewer rows, e.g., saveRDS(my.mtx[1:5000,], file = "expr.rds"), pyreadr works without any issue. My pyreadr version is 0.4.9

pty0111 avatar Nov 12 '23 20:11 pty0111

I can confirm this bug exists. I submitted a fix to librdata (https://github.com/WizardMac/librdata/pull/49), please consider updating once it is merged.

elaude avatar Apr 22 '24 09:04 elaude

Sure, I will update here once the PR is merged into librdata

ofajardo avatar Jul 02 '24 09:07 ofajardo