parallelformers
parallelformers copied to clipboard
Issue running parallelformers test script in a VM
How to reproduce
First of all, thanks for this great project!
I'm facing an issue running the test code provided here on Kubernetes.
This is what I'm running inside a Kubeflow pod:
python3 tests/seq2seq_lm.py --test-name=test --name=Helsinki-NLP/opus-mt-en-zh --gpu-from=0 --gpu-to=3 --use-pf
I'm using a g4dn.12xlarge AWS machine with four T4 GPUs.
The pod hangs when executing this line until I manually terminate it.
I suspected this change might have been the culprit so I ran the same code with v1.2.4 of parallelformers. This time, the pod quits during execution of the same line without outputting any errors which is odd.
Notably, if I run the same command without --use-pf
it runs fine.
I saw you've reported some problems using docker. However, memory should not be an issue here since I'm using Helsinki-NLP/opus-mt-en-zh
model which is relatively small.
I was wondering if parallelformers code has ever been tested on Kubernetes? Also would appreciate it if you could look into this issue. Thanks!
Environment
- OS : Linux
- Python version : 3.8.3
- Transformers version : 4.17.0
- Whether to use Docker: Yes
- Misc.:
- branch: main
can you try that in the if __name__ == '__main__'
context?