xport icon indicating copy to clipboard operation
xport copied to clipboard

RecursionError when multiple columns have same names

Open meain opened this issue 4 years ago • 3 comments

If I have multiple columns with same names(usually after I trim it to just 8 chars), I get a RecursionError: maximum recursion depth exceeded when running the following code.

import pandas as pd
import xport
import xport.v56


df = pd.read_csv("temp.csv")
ds = xport.Dataset(df, name="DATA", label="Wonderful data")
ds = ds.rename(columns={k: k.upper()[:8] for k in ds})  # <- can happen here
library = xport.Library({"DATA": ds})
with open("example.xpt", "wb") as f:
    xport.v56.dump(library, f)  # <- or here

We will get recursion error on ds.rename line, but even if we were to use something like ds.columns = [c[:8] for c in ds.colums] we get a similar error in xport.v56.dump line. The csv file that I am using is relatively simple.

BIRTHDAY_DTC,BIRTHDAY_MONTH
12/10/1999,SEPT

pip freeze

pandas is not the latest but I had to pin it to that because otherwise the sas viewer was not showing labels, but I was also having this issue with pandas==1.2.4.

click==8.0.1
importlib-metadata==4.0.1
numpy==1.20.3
pandas==1.0.3
python-dateutil==2.8.1
pytz==2021.1
PyYAML==5.4.1
six==1.16.0
typing-extensions==3.10.0.0
xport==3.2.1
zipp==3.4.1

And here is the full logs.

meain avatar May 25 '21 05:05 meain

[: 8] 

variables are not unique after slicing image

gaineleanor avatar Oct 29 '21 09:10 gaineleanor

Yup, I'm aware of that. Just thought that it should probably at least give a more descriptive error than the current one.

meain avatar Oct 29 '21 13:10 meain

Thanks for reporting this, @meain . RecursionError: maximum recursion depth exceeded in __instancecheck__ confuses me. The other recursion bug was caused by trying to log a helpful message. This looks different.

I agree we could have a better error message. Checking for uniqueness and raising a specific error for duplicates would be much better.

selik avatar Nov 01 '21 08:11 selik