Shauray Singh

Results 13 comments of Shauray Singh

Hi, @adamkatav I was not reproduce the error you mentioned in the above messages, I'm on the same versions (for the most part) as mentioned.

Yes, the issue you encountered with the RuntimeError could potentially be related to the Docker environment. If you are experiencing the same error in Colab as well, it suggests that...

@sgugger I don't particularly understand why this error occurs in examples_flax - ``` argparse.ArgumentError: argument --sharded_ddp: invalid typing.Union[str, bool, typing.List[transformers.trainer_utils.ShardedDDPOption], NoneType] value: '' ```

@sgugger `bool` breaks `--sharded_ddp`, I think we can still maintain Boolean arguments with string itself and https://github.com/huggingface/transformers/blob/20d6b84613984f2497587a62774704882ccbeee6/src/transformers/hf_argparser.py#L168-L173 with this `--sharded_ddp` and `--fsdp` defaults to string

I'm not sure why but with GPU=0 it still trains on the CPU with just a single core. Maybe you can tell me how to train on a GPU?

Thank you @wozeparrot looks like I don't have drivers for openCL. Does using `GPU=1` or `GPU=2` make a difference?

I'm using Ubuntu right now on an Intel i5

I do have a dedicated Nvidia GPU

Training on a relatively smaller dataset with a batch size of 8 (due to memory constraints), should be reproducible by `CUDA=1 python3 ./examples/mlperf/model_train`.