raydp icon indicating copy to clipboard operation
raydp copied to clipboard

Possible unhandled error from worker: ray::ParallelIteratorWorker.par_iter_next_batch()

Open ConeyLiu opened this issue 5 years ago • 3 comments

The following erros are just error prints. It is a bug in ray and will be fixed in future.

2020-12-01 20:44:59,081	ERROR worker.py:977 -- Possible unhandled error from worker: ray::ParallelIteratorWorker.par_iter_next_batch() (pid=24362, ip=192.168.3.6)
  File "python/ray/_raylet.pyx", line 464, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 419, in ray._raylet.execute_task.function_executor
  File "/Users/xianyang/miniconda3/envs/torch/lib/python3.7/site-packages/ray/util/iter.py", line 1158, in par_iter_next_batch
    batch.append(self.par_iter_next())
  File "/Users/xianyang/miniconda3/envs/torch/lib/python3.7/site-packages/ray/util/iter.py", line 1152, in par_iter_next
    return next(self.local_it)
StopIteration

ConeyLiu avatar Dec 01 '20 12:12 ConeyLiu

Hi, I also got this error, is there a bug report for Ray?

valiantljk avatar Dec 17 '20 06:12 valiantljk

I am getting these errors too. Torch also complains about the input tensor size (I am running the NYC taxi fare prediction example). Any idea why is this happening?

(pid=56035) 2021-02-23 12:26:00,645 INFO distributed_torch_runner.py:58 -- Setting up process group for: tcp://9.1.44.100:55874 [rank=1] (pid=56022) 2021-02-23 12:26:00,643 INFO distributed_torch_runner.py:58 -- Setting up process group for: tcp://9.1.44.100:55874 [rank=0] (pid=56035) /home/guryaniv/anaconda3/envs/raydp/lib/python3.6/site-packages/torch/nn/modules/loss.py:822: UserWarning: Using a target size (torch.Size([256, 1])) that is different to the input size (torch.Size([256])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size. (pid=56035) return F.smooth_l1_loss(input, target, reduction=self.reduction, beta=self.beta) (pid=56022) /home/guryaniv/anaconda3/envs/raydp/lib/python3.6/site-packages/torch/nn/modules/loss.py:822: UserWarning: Using a target size (torch.Size([256, 1])) that is different to the input size (torch.Size([256])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size. (pid=56022) return F.smooth_l1_loss(input, target, reduction=self.reduction, beta=self.beta) Epoch-0: {'num_samples': 1737186, 'epoch': 1.0, 'batch_count': 3393.0, 'train_loss': 5.325447512030797, 'last_train_loss': 5.227337598800659} (pid=56035) /home/guryaniv/anaconda3/envs/raydp/lib/python3.6/site-packages/torch/nn/modules/loss.py:822: UserWarning: Using a target size (torch.Size([241, 1])) that is different to the input size (torch.Size([241])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size. (pid=56035) return F.smooth_l1_loss(input, target, reduction=self.reduction, beta=self.beta) (pid=56022) /home/guryaniv/anaconda3/envs/raydp/lib/python3.6/site-packages/torch/nn/modules/loss.py:822: UserWarning: Using a target size (torch.Size([241, 1])) that is different to the input size (torch.Size([241])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size. (pid=56022) return F.smooth_l1_loss(input, target, reduction=self.reduction, beta=self.beta) 2021-02-23 12:28:33,382 ERROR worker.py:1053 -- Possible unhandled error from worker: ray::ParallelIteratorWorker.par_iter_next_batch() (pid=56049, ip=9.1.44.100) File "python/ray/_raylet.pyx", line 480, in ray._raylet.execute_task File "python/ray/_raylet.pyx", line 432, in ray._raylet.execute_task.function_executor File "/home/guryaniv/anaconda3/envs/raydp/lib/python3.6/site-packages/ray/util/iter.py", line 1158, in par_iter_next_batch batch.append(self.par_iter_next()) File "/home/guryaniv/anaconda3/envs/raydp/lib/python3.6/site-packages/ray/util/iter.py", line 1152, in par_iter_next return next(self.local_it) StopIteration 2021-02-23 12:28:33,384 ERROR worker.py:1053 -- Possible unhandled error from worker: ray::ParallelIteratorWorker.par_iter_next_batch() (pid=56014, ip=9.1.44.100) File "python/ray/_raylet.pyx", line 480, in ray._raylet.execute_task File "python/ray/_raylet.pyx", line 432, in ray._raylet.execute_task.function_executor File "/home/guryaniv/anaconda3/envs/raydp/lib/python3.6/site-packages/ray/util/iter.py", line 1158, in par_iter_next_batch batch.append(self.par_iter_next()) File "/home/guryaniv/anaconda3/envs/raydp/lib/python3.6/site-packages/ray/util/iter.py", line 1152, in par_iter_next return next(self.local_it) StopIteration

yanivg10 avatar Feb 23 '21 20:02 yanivg10

Hi @yanivg10, it is just the exception print. The actual exception has been caught, you can ignore it. The ray community is working on fixing it.

ConeyLiu avatar Feb 24 '21 01:02 ConeyLiu

close as stale

kira-lin avatar Apr 14 '23 08:04 kira-lin