parallelformers icon indicating copy to clipboard operation
parallelformers copied to clipboard

Issue running parallelformers test script in a VM

Open Mehrad0711 opened this issue 2 years ago • 1 comments

How to reproduce

First of all, thanks for this great project!

I'm facing an issue running the test code provided here on Kubernetes.

This is what I'm running inside a Kubeflow pod:

python3 tests/seq2seq_lm.py --test-name=test --name=Helsinki-NLP/opus-mt-en-zh --gpu-from=0 --gpu-to=3 --use-pf

I'm using a g4dn.12xlarge AWS machine with four T4 GPUs.

The pod hangs when executing this line until I manually terminate it.

I suspected this change might have been the culprit so I ran the same code with v1.2.4 of parallelformers. This time, the pod quits during execution of the same line without outputting any errors which is odd.

Notably, if I run the same command without --use-pf it runs fine.

I saw you've reported some problems using docker. However, memory should not be an issue here since I'm using Helsinki-NLP/opus-mt-en-zh model which is relatively small.

I was wondering if parallelformers code has ever been tested on Kubernetes? Also would appreciate it if you could look into this issue. Thanks!

Environment

  • OS : Linux
  • Python version : 3.8.3
  • Transformers version : 4.17.0
  • Whether to use Docker: Yes
  • Misc.:
  • branch: main

Mehrad0711 avatar Mar 15 '22 00:03 Mehrad0711

can you try that in the if __name__ == '__main__' context?

hyunwoongko avatar Jul 27 '22 20:07 hyunwoongko