When writing to RDS, complex datatype columns (e.g. list-columns) are written as character strings
I have a pandas dataframe where one column is a column of python lists (i.e. R vectors). When I write this dataframe to RDS, and load the RDS back into R, the column is loaded as a character column.
Example of what the pandas dataframe looks like:
Example of how it loads in R:
This is obviously an edge case, so I'm not totally surprised it doesn't work, but in this case the column is being used to represent text embeddings, so the alternative of trying to split it into hundreds of columns is not ideal. I can do a reprex if you'd like one but I think this demonstration is pretty trivial.
Hi, thanks for the report. Unfortunately the underlying C library does not support writing columns with objects other than the simple types (integer, double, character, ... See the complete list in the Readme). Therefore it cannot be fixed.
In an attemot to do something with the data, when pyreadr sees an object of a type not supported, it writes its character representation. That is why you see a character vector in R.
You can submit an issue request to librdara if you wish, if they eventually support it some day, I xan implement in pyreadr.
As a suggestion, in R, modify the character to the syntax of a vector (replace [ with c(, etc), and do an eval.
closed due to inactivity