Vedant Roy
Vedant Roy
Quick question: If we were to use ESM, wouldn't it still use the "node" key in the "exports" object. And the "node" key refers to a commonjs file, which is...
Also running into this issue for `xonsh` version 0.13.3.
@hanlint If I provide a github repository + a Dockerfile, would that be helpful? I've also filed an issue here: https://github.com/pytorch/pytorch/issues/83824 since it might be a Pytorch issue.
@hanlint Also to be clear, I can reliably reproduce this issue when training with multiple GPUs. It is somewhat inconsistent at 2, but it happens at >= 6 every time....
Ok, additional details. The error is happening because my process is receiving a SIGCHILD signal, which is causing the interruption. I can workaround the error by doing a `sleep` before...
@kobindra ``` contrastive_train-contrastive_train-1 | Traceback (most recent call last): contrastive_train-contrastive_train-1 | contrastive_train-contrastive_train-1 | File "contrastive_train.py", line 63, in contrastive_train-contrastive_train-1 | app() contrastive_train-contrastive_train-1 | contrastive_train-contrastive_train-1 | File "contrastive_train.py", line 52, in...
oh, it's `bucket` not `bucket_name`
@kobindra Is there a way to specify the folder name for the checkpoints. for example, I don't really want it to be "some random integer + a word", I would...
Doesn't work, see: ``` contrastive_train-contrastive_train-1 | Traceback (most recent call last): contrastive_train-contrastive_train-1 | File "/root/miniconda3/envs/video-rec/lib/python3.8/site-packages/boto3/s3/transfer.py", line 288, in upload_file contrastive_train-contrastive_train-1 | future.result() contrastive_train-contrastive_train-1 | File "/root/miniconda3/envs/video-rec/lib/python3.8/site-packages/s3transfer/futures.py", line 103, in result...
Setting `num_concurrent_uploads=1` doesn't help