vak
vak copied to clipboard
torch dataloader may be corrupting .npz files?
Occasionally I get an error when running vak predict about a bad CRC-32 for file 's.npy'
I have noticed that this happens with a dataset where I have already run vak predict multiple times (e.g. while testing for an unrelated bug that requires me to repeatedly run vak predict), and that once it happens, I am unable to run again without getting the error.
I think this may be caused by possibly torch.utils.Dataloader somehow corrupting the .npz file?
Maybe because it's using multiprocessing and somehow the file is not closed correctly?
Here's a complete traceback from one occurrence:
$ vak predict pk92r45_predict_190511_v2.toml
Logging results to /home/pimienta/Documents/data/vocal/avani-data/pk92r45_MMAN/vak_outputs/predict
loading SpectScaler from path: /home/pimienta/Documents/data/vocal/avani-data/pk92r45_MMAN/vaktrain/results_211108_134532/StandardizeSpect
loading labelmap from path: /home/pimienta/Documents/data/vocal/avani-data/pk92r45_MMAN/vaktrain/results_211108_134532/labelmap.json
loading dataset to predict from csv path: /home/pimienta/Documents/data/vocal/avani-data/pk92r45_MMAN/vak_outputs/predict/190511_v2_prep_211125_210922.csv
will save annotations in .csv file: /home/pimienta/Documents/data/vocal/avani-data/pk92r45_MMAN/vak_outputs/pk92r45_190511.csv
dataset has timebins with duration: 0.002
shape of input to networks used for predictions: torch.Size([1, 152, 88])
instantiating models from model-config map:/n{'TweetyNet': {'optimizer': {'lr': 0.001}, 'network': {}, 'loss': {}, 'metrics': {}}}
loading checkpoint for TweetyNet from path: /home/pimienta/Documents/data/vocal/avani-data/pk92r45_MMAN/vaktrain/results_211108_134532/TweetyNet/checkpoints/max-val-acc-checkpoint.pt
Loading checkpoint from:
/home/pimienta/Documents/data/vocal/avani-data/pk92r45_MMAN/vaktrain/results_211108_134532/TweetyNet/checkpoints/max-val-acc-checkpoint.pt
running predict method of TweetyNet
batch 301 / 557: 54%|███████████████████████████████████████████████████████████ | 302/557 [00:59<00:50, 5.04it/s]
Traceback (most recent call last):
File "/home/pimienta/anaconda3/envs/vak040b3/bin/vak", line 8, in <module>
sys.exit(main())
File "/home/pimienta/anaconda3/envs/vak040b3/lib/python3.8/site-packages/vak/__main__.py", line 45, in main
cli.cli(command=args.command, config_file=args.configfile)
File "/home/pimienta/anaconda3/envs/vak040b3/lib/python3.8/site-packages/vak/cli/cli.py", line 30, in cli
COMMAND_FUNCTION_MAP[command](toml_path=config_file)
File "/home/pimienta/anaconda3/envs/vak040b3/lib/python3.8/site-packages/vak/cli/predict.py", line 42, in predict
core.predict(
File "/home/pimienta/anaconda3/envs/vak040b3/lib/python3.8/site-packages/vak/core/predict.py", line 227, in predict
pred_dict = model.predict(pred_data=pred_data, device=device)
File "/home/pimienta/anaconda3/envs/vak040b3/lib/python3.8/site-packages/vak/engine/model.py", line 478, in predict
return self._predict(pred_data)
File "/home/pimienta/anaconda3/envs/vak040b3/lib/python3.8/site-packages/vak/engine/model.py", line 347, in _predict
for ind, batch in enumerate(progress_bar):
File "/home/pimienta/anaconda3/envs/vak040b3/lib/python3.8/site-packages/tqdm/std.py", line 1178, in __iter__
for obj in iterable:
File "/home/pimienta/anaconda3/envs/vak040b3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 517, in __next__
data = self._next_data()
File "/home/pimienta/anaconda3/envs/vak040b3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data
return self._process_data(data)
File "/home/pimienta/anaconda3/envs/vak040b3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
data.reraise()
File "/home/pimienta/anaconda3/envs/vak040b3/lib/python3.8/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
zipfile.BadZipFile: Caught BadZipFile in DataLoader worker process 2.
Original Traceback (most recent call last):
File "/home/pimienta/anaconda3/envs/vak040b3/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
data = fetcher.fetch(index)
File "/home/pimienta/anaconda3/envs/vak040b3/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/pimienta/anaconda3/envs/vak040b3/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/pimienta/anaconda3/envs/vak040b3/lib/python3.8/site-packages/vak/datasets/vocal_dataset.py", line 75, in __getitem__
spect = spect_dict[self.spect_key]
File "/home/pimienta/anaconda3/envs/vak040b3/lib/python3.8/site-packages/numpy/lib/npyio.py", line 253, in __getitem__
return format.read_array(bytes,
File "/home/pimienta/anaconda3/envs/vak040b3/lib/python3.8/site-packages/numpy/lib/format.py", line 763, in read_array
data = _read_bytes(fp, read_size, "array data")
File "/home/pimienta/anaconda3/envs/vak040b3/lib/python3.8/site-packages/numpy/lib/format.py", line 892, in _read_bytes
r = fp.read(size - len(data))
File "/home/pimienta/anaconda3/envs/vak040b3/lib/python3.8/zipfile.py", line 940, in read
data = self._read1(n)
File "/home/pimienta/anaconda3/envs/vak040b3/lib/python3.8/zipfile.py", line 1030, in _read1
self._update_crc(data)
File "/home/pimienta/anaconda3/envs/vak040b3/lib/python3.8/zipfile.py", line 958, in _update_crc
raise BadZipFile("Bad CRC-32 for file %r" % self.name)
zipfile.BadZipFile: Bad CRC-32 for file 's.npy'