python-dwca-reader icon indicating copy to clipboard operation
python-dwca-reader copied to clipboard

Using chunksize gives `TypeError: 'TextFileReader' object does not support item assignment`

Open nigelcharman opened this issue 8 months ago • 4 comments

We've been using python-dwca-reader with no problems loading about 13k occurrences. We now need to scale it up to load about 3.25m occurrences.

Changing the code from:

        core_df = dwca.pd_read('occurrence.txt', parse_dates=True)

to:

        for chunk in dwca.pd_read('occurrence.txt', parse_dates=True, chunksize=10):
        ...

causes the error:

    ...
    for chunk in dwca.pd_read('occurrence.txt', parse_dates=True, chunksize=10):
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/opt/asdf/installs/python/3.11.7/lib/python3.11/site-packages/dwca/read.py", line 209, in pd_read
    df[shorten_term(field['term'])] = field_default_value
    ~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'TextFileReader' object does not support item assignment

Looking at gbif-alert, I see that you're using enumerate(dwca) rather than reading it in chunks, so I'll give that a try.

nigelcharman avatar Jul 02 '24 10:07 nigelcharman