pyreadr icon indicating copy to clipboard operation
pyreadr copied to clipboard

When writing to RDS, complex datatype columns (e.g. list-columns) are written as character strings

Open aaronrudkin opened this issue 1 year ago • 2 comments

I have a pandas dataframe where one column is a column of python lists (i.e. R vectors). When I write this dataframe to RDS, and load the RDS back into R, the column is loaded as a character column.

Example of what the pandas dataframe looks like: Screenshot 2024-10-19 at 2 53 56 PM

Example of how it loads in R: Screenshot 2024-10-19 at 2 55 06 PM

This is obviously an edge case, so I'm not totally surprised it doesn't work, but in this case the column is being used to represent text embeddings, so the alternative of trying to split it into hundreds of columns is not ideal. I can do a reprex if you'd like one but I think this demonstration is pretty trivial.

aaronrudkin avatar Oct 19 '24 18:10 aaronrudkin

Hi, thanks for the report. Unfortunately the underlying C library does not support writing columns with objects other than the simple types (integer, double, character, ... See the complete list in the Readme). Therefore it cannot be fixed.

In an attemot to do something with the data, when pyreadr sees an object of a type not supported, it writes its character representation. That is why you see a character vector in R.

You can submit an issue request to librdara if you wish, if they eventually support it some day, I xan implement in pyreadr.

ofajardo avatar Oct 19 '24 22:10 ofajardo

As a suggestion, in R, modify the character to the syntax of a vector (replace [ with c(, etc), and do an eval.

ofajardo avatar Oct 19 '24 22:10 ofajardo

closed due to inactivity

ofajardo avatar Jan 17 '25 16:01 ofajardo