clearml
clearml copied to clipboard
Last data chunk does not get uploaded to GCP bucket
I created a dataset following: https://clear.ml/docs/latest/docs/clearml_data/data_management_examples/data_man_simple
When I upload it to my GCP bucket by:
(yolo555) ➜ yolov5 git:(master) ✗ clearml-data close --storage gs://xxx/clearml-test --chunk-size 128 --verbose
The last prompts are:
Uploading dataset changes (98 files compressed to 94.76 MiB) to gs://icm-data-lake/clearml-test
Uploading dataset changes (98 files compressed to 94.73 MiB) to gs://icm-data-lake/clearml-test
Uploading dataset changes (97 files compressed to 94.54 MiB) to gs://icm-data-lake/clearml-test
Uploading dataset changes (98 files compressed to 94.23 MiB) to gs://icm-data-lake/clearml-test
2022-09-08 09:55:31,054 - clearml.storage - ERROR - Failed uploading: HTTPSConnectionPool(host='storage.googleapis.com', port=443): Read timed out. (read timeout=60)
File compression and upload completed: total size 38.47 GiB, 320 chunk(s) stored (average size 123.11 MiB)
Dataset closed and finalized
Did the last chunk failed to get uploaded or is it just a false alarm?
Hi @mikel-brostrom,
That...shouldn't happen :) Does this persist? IE, does it happen every time? Also, it might sound silly, but did you try downloading and checking if all files are there? Should be easy to compare original and downloaded files.
In the meantime I'll check internally if we defend somehow against partial upload issues.
@mikel-brostrom, what SDK version are you using? with version 1.6.4 we've added a retry mechanism for uploads which should help in your case.
pip list gives me :
clearml 1.6.4
I don't know if it is persistent, have only tried it once. I will report back when I try it again
@mikel-brostrom, so it should have retried, I'll check if something went wrong or we just don't print retry info Thanks!
I run the command again. No issue this time:
Uploading dataset changes (90 files compressed to 95.01 MiB) to gs://icm-data-lake/clearml-test
Uploading dataset changes (89 files compressed to 94.66 MiB) to gs://icm-data-lake/clearml-test
File compression and upload completed: total size 38.58 GiB, 320 chunk(s) stored (average size 123.46 MiB)
Dataset closed and finalized
I guess this is not an issue @erezalg anymore :smile: