RESOURCE_EXHAUSTED: XLA:TPU compile permanent

Open infocodiste opened this issue 1 year ago • 0 comments

Hi I m using v38 tpu in GCP and while loading model getting below error :

he above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/deep_c/workspace/LWM/lwm/vision_chat.py", line 254, in run(main) File "/home/deep_c/miniconda3/envs/large_vision_model/lib/python3.10/site-packages/absl/app.py", line 308, in run _run_main(main, args) File "/home/deep_c/miniconda3/envs/large_vision_model/lib/python3.10/site-packages/absl/app.py", line 254, in _run_main sys.exit(main(argv)) File "/home/deep_c/workspace/LWM/lwm/vision_chat.py", line 250, in main output = sampler(prompts, FLAGS.max_n_frames)[0] File "/home/deep_c/workspace/LWM/lwm/vision_chat.py", line 230, in call output, self.sharded_rng = self._forward_generate( jaxlib.xla_extension.XlaRuntimeError: RESOURCE_EXHAUSTED: XLA:TPU compile permanent error. Ran out of memory in memory space hbm. Used 21.95G of 15.48G hbm. Exceeded hbm capacity by 6.47G.

Total hbm usage >= 22.47G: reserved 530.00M program 21.95G arguments 0B

How to fix this?

Mar 12 '24 12:03 infocodiste