pyreadr icon indicating copy to clipboard operation
pyreadr copied to clipboard

index of pandas dataframe is lost when writing to Rds

Open julibeg opened this issue 4 years ago • 2 comments

Perhaps it's a known limitation, but I didn't find it in the README.

When writing a pd.DataFrame to .Rds, the index is lost.

Example: In Python

>>> import pandas as pd
>>> import numpy as np
>>> import pyreadr
>>> bla = pd.DataFrame(np.arange(12).reshape(4, 3), index=list('abcd'))
>>> bla
   0   1   2
a  0   1   2
b  3   4   5
c  6   7   8
d  9  10  11
>>> pyreadr.write_rds("bla.Rds", bla)

In R:

> bla = readRDS("bla.Rds")
> bla
  0  1  2
1 0  1  2
2 3  4  5
3 6  7  8
4 9 10 11

I'm on linux 64 bit with Python 3.8.6 (Anaconda), R 4.0.3, and pyreadr 0.4.0 (installed from conda).

Expected behavior:

That the rownames of the R dataframe are the index of the pandas dataframe (i.e. 'a', 'b', 'c', 'd').

julibeg avatar Mar 17 '21 23:03 julibeg

Good catch!, I think the api from the C library I am using in the back currently does not allow to set the rownames when writing a dataframe.

I will therefore add it to the list of known limitations and open a ticket in the C library to ask for the implementation of this feature. We may eventually get it implemented some day =)

ofajardo avatar Mar 18 '21 15:03 ofajardo

That would be nice, fingers crossed :crossed_fingers:

julibeg avatar Mar 18 '21 15:03 julibeg