GraphGPT icon indicating copy to clipboard operation
GraphGPT copied to clipboard

StopIteration: Caught StopIteration in replica 0 on device 0.

Open octopusStar218 opened this issue 9 months ago • 0 comments

We ran into the same problem as issue#17, but we still got an error even though we had to comment out "replace_llama_attn_with_flash_attn()"

0%|                                                | 0/274797 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/accelerate/accelerator.py", line 1058, in accumulate
    yield
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/transformers/trainer.py", line 2216, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/transformers/trainer.py", line 3238, in training_step
    loss = self.compute_loss(model, inputs)
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/transformers/trainer.py", line 3264, in compute_loss
    outputs = model(**inputs)
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 185, in forward
    outputs = self.parallel_apply(replicas, inputs, module_kwargs)
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 200, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 108, in parallel_apply
    output.reraise()
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/torch/_utils.py", line 705, in reraise
    raise exception
StopIteration: Caught StopIteration in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in _worker
    output = module(*input, **kwargs)
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/graph_learning/GraphGPT-main/graphgpt/model/GraphLlama.py", line 325, in forward
    outputs = self.model(
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/graph_learning/GraphGPT-main/graphgpt/model/GraphLlama.py", line 202, in forward
    node_forward_out = graph_tower(g)
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/anaconda3/envs/graphgpt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/graph_learning/GraphGPT-main/graphgpt/model/graph_layers/graph_transformer.py", line 64, in forward
    device = self.parameters().__next__().device
StopIteration

  0%|          | 0/274797 [01:57<?, ?it/s]          

octopusStar218 avatar May 23 '24 15:05 octopusStar218