root_pandas icon indicating copy to clipboard operation
root_pandas copied to clipboard

Problem with saving DataFrames to root files when they contain dtype object

Open FerdinandEiteneuer opened this issue 8 years ago • 8 comments

Hi,

i have several textfiles with strings and floats in them. I can load them into a panda dataframe nicely. I also need to save them in root format and wanted to use this library.

However, when i call the to_root function on my dataframe i got the following:

UserWarning: converter for dtype('O') is not implemented skipping

And indeed, if I load the rootfile later with read_root later the columns with strings in them miss.

I then tried this:

>>> for c in df.columns: ... if df[c].dtype == object: ... df[c] = df[c].astype(str)

Now i did not get any error message using to_root. However, when i load the root file later to a pandas dataframe it still misses the columns where strings are supposed to be.

How to fix this? Thank you very much

FerdinandEiteneuer avatar Mar 03 '17 17:03 FerdinandEiteneuer

You need to convert the str columns to the numpy string type S.

However, there seems to be a bug in pandas which is preventing that on reassigment, I filed this issue here: https://github.com/pandas-dev/pandas/issues/15575

In the mean time, you might need to copy your data:

df2 = pd.DataFrame()
for name, column in df.items():
    if columns.dtype == object:
        df2[name] = column.astype('S')
    else:
        df2[name] = column

The S column type requires a maximum length of the string, be default it will take the longest string in the series. If you want to give the max length yourself, you can do column.astype('S10') for a max length of 10.

maxnoe avatar Mar 05 '17 12:03 maxnoe

They actually consider it a bug that the dtype on new assignments is not object.

maxnoe avatar Mar 05 '17 17:03 maxnoe

Hi,

thanks for taking the time. Unfortunately your way of circumventing this issue does not work for me :( Even if i create this new df2 and call .astype('S') or .astype('S10') it will stay of type object I tried what you did in https://github.com/pandas-dev/pandas/issues/15575 and also there my output of

import pandas as pd df = pd.DataFrame({'a': ['Hello', 'World']}) df['a'] = df['a'].astype('S') df['b'] = df['a'].astype('S') print(df.dtypes)

is just simply object for df['b'] aswell.

FerdinandEiteneuer avatar Mar 06 '17 14:03 FerdinandEiteneuer

Hi All,

has this been resolved at all? I arrived to the same issue of saving string to the root file.

UserWarning: converter for dtype('O') is not implemented (skipping) cobj = _librootnumpy.array2tree_toCObj(arr, name=name, tree=incobj)

Best, Mat

MatousVozak avatar Oct 30 '19 15:10 MatousVozak

@MatousVozak What is the contents of the column that is an object? Strings, arrays or something else? The issue is inside root_numpy but it's unlikely to be fixed unless you're willing to make a pull request as it has been effectively depreciated in favour of uproot.

That said, a better question is you need to save to a ROOT file? You might be better served using a file format natively supported by pandas like hdf5.

chrisburr avatar Nov 01 '19 06:11 chrisburr

Hi @chrisburr, yes it is a string and I needed to save into a root file. I simply wanted to change entries of one branch which was of a type char/string. As this was a hot fix and I couldn't find a quick work around I Eventually turned into a pyroot to do the job.

Best, Mat

MatousVozak avatar Nov 02 '19 14:11 MatousVozak

Was a workaround ever found for this?

goi42 avatar Jul 01 '20 15:07 goi42

Kind ping to @chrisburr ... though you may want to look at the uproot package at this stage?

eduardo-rodrigues avatar Jul 07 '20 16:07 eduardo-rodrigues

As explicitly written in the README since a while, root_pandas, and root_numpy on which it depends, has been deprecated and effectively unmaintained for quite a while. We decided to close anthing outstanding as "won't do" and archive the package at this point.

eduardo-rodrigues avatar Jan 09 '23 09:01 eduardo-rodrigues