Bill Stewart comments

Results 7 comments of


                                            Bill Stewart

Auto models get ray.exceptions.ActorDiedError when run on multi-GPU node

If I add _'devices': 1,_ to the nhits_config section of run_nhits.py, the code runs to completion correctly: "val_check_steps": tune.choice([100]), # Compute validation every 100 epochs "random_seed": tune.randint(1, 10), # "devices":...

Auto models get ray.exceptions.ActorDiedError when run on multi-GPU node

Thank you @marcopeix! I ran the same procedure on a vanilla g4dn.12xlarge instance (4 GPUs) and confirmed that the same issue exists as with the ml.g4dn.12xlarge, so I believe the...

Auto models get ray.exceptions.ActorDiedError when run on multi-GPU node

@elephaint Apologies, the below is a lot of detail to wade through! The short answer is, in this example I am installing NF 3.0.0 directly from the latest main branch....

Auto models get ray.exceptions.ActorDiedError when run on multi-GPU node

@elephaint Theorizing that there might be a version misalignment in key packages causing these problems, I have experimented with several alternative installation approaches and have reproduced the error every time....

Auto models get ray.exceptions.ActorDiedError when run on multi-GPU node

@elephaint I originally ran into this issue with my own datasets (both synthetic and real timeseries with and without covariates, and using AutoDeepAR/AutoRNN rather than AutoNHITS as in the above),...

Auto models get ray.exceptions.ActorDiedError when run on multi-GPU node

@elephaint The run succeeded when I used **_backend='optuna'._** Here's the run_nhits.py that I updated with Optuna in place of Ray as the backend. I upped the max_steps to 1000 and...

Auto models get ray.exceptions.ActorDiedError when run on multi-GPU node

@elephaint I thought I read in the Nixtla documentation a few weeks ago that Optuna only utilized 1 GPU, such that if one wants to utilize multiple GPUs one must...