DownloadConceptualCaptions
DownloadConceptualCaptions copied to clipboard
Negative size passed to PyBytes_FromStringAndSize
I experience the following problem when trying to download the validation split using the most recent version of the code. It seems to be related to Shelve library, and it may be a known problem on my platform (OS X).
Python 3.7.1 (default, Oct 23 2018, 14:07:42)
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
python download_data.py
Opening Validation_GCC-1.1.0-Validation.tsv Data File...
Processing 15840 Images:
Generating parts... | 0/15840 [00:00<?, ?it/s]158 parts. 100 per part. Using 32 processes
Downloading: 100%|███████| 15840/15840 [07:46<00:00, 33.95it/s]
Finished Downloading.
Generating Dataframe from results...
Traceback (most recent call last):
File "download_data.py", line 131, in <module>
df = df_from_shelve(chunk_size=images_per_part, func=download_image, dataset_name=data_name)
File "download_data.py", line 119, in df_from_shelve
keylist = sorted([int(k) for k in results.keys()])
File "download_data.py", line 119, in <listcomp>
keylist = sorted([int(k) for k in results.keys()])
File "/Users/lvx122/miniconda3/lib/python3.7/_collections_abc.py", line 720, in __iter__
yield from self._mapping
File "/Users/lvx122/miniconda3/lib/python3.7/shelve.py", line 95, in __iter__
for k in self.dict.keys():
SystemError: Negative size passed to PyBytes_FromStringAndSize
I haven't got this one yet - I've Python 3.6.4
on Ubuntu 16.04
That error in that place will prevent you from writing the report at the end, but images should still be downloaded, (even though some resuming relies on the shelve thing too).
It might work if you delete shelve and see if it recreates again without that error?
It's safe to delete the .bak
, .dat
, .dir
files associated with shelve, and when the download runs again it should skip downloading images based on files, and recreate the shelve for a report (downloaded_validation_report.tsv.gz
will be overwritten too)
It might work if you delete shelve and see if it recreates again without that error?
It gives the same error but it seems like this is an upstream problem.
Ah good to know! Thanks!
I had the same error on OS X but I got a workaround.
- Create two empty files,
validation_download_image_100_results.tmp.dat
andvalidation_download_image_100_results.tmp.dir
. - Run the
download_data.py
- In this way, the python shelve will generate
*.dat
,*.bak
,*.dir
rather than*.db
.
It is an error with shelve. My guessing is that there is an overflow for the 32 bit integer as I tried to download only the first 20 images and it worked fine.