SAOT icon indicating copy to clipboard operation
SAOT copied to clipboard

loading dataset

Open Jee-King opened this issue 2 years ago • 9 comments

Hi, could you give me some suggestions about the following error? Thx

pydev debugger: process 224361 is connecting
Connected to pydev debugger (build 202.7660.27)
Training:  dimp  saot
3612
2964
2380
1860
1404
2022-04-30 20:07:48.186097: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
No matching checkpoint file found
Training crashed at epoch 1
Traceback for the error!
Traceback (most recent call last):
  File "/home/iccd/Documents/SAOT-main/ltr/trainers/base_trainer.py", line 70, in train
    self.train_epoch()
  File "/home/iccd/Documents/SAOT-main/ltr/trainers/ltr_trainer.py", line 80, in train_epoch
    self.cycle_dataset(loader)
  File "/home/iccd/Documents/SAOT-main/ltr/trainers/ltr_trainer.py", line 52, in cycle_dataset
    for i, data in enumerate(loader, 1):
  File "/home/iccd/miniconda3/envs/pytracking/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in __next__
    data = self._next_data()
  File "/home/iccd/miniconda3/envs/pytracking/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 475, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/iccd/miniconda3/envs/pytracking/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
    return self.collate_fn(data)
  File "/home/iccd/Documents/SAOT-main/ltr/data/loader.py", line 105, in ltr_collate_stack1
    return TensorDict({key: ltr_collate_stack1([d[key] for d in batch]) for key in batch[0]})
  File "/home/iccd/Documents/SAOT-main/ltr/data/loader.py", line 105, in <dictcomp>
    return TensorDict({key: ltr_collate_stack1([d[key] for d in batch]) for key in batch[0]})
  File "/home/iccd/Documents/SAOT-main/ltr/data/loader.py", line 113, in ltr_collate_stack1
    return [ltr_collate_stack1(samples) for samples in transposed]
  File "/home/iccd/Documents/SAOT-main/ltr/data/loader.py", line 113, in <listcomp>
    return [ltr_collate_stack1(samples) for samples in transposed]
  File "/home/iccd/Documents/SAOT-main/ltr/data/loader.py", line 113, in ltr_collate_stack1
    return [ltr_collate_stack1(samples) for samples in transposed]
  File "/home/iccd/Documents/SAOT-main/ltr/data/loader.py", line 113, in <listcomp>
    return [ltr_collate_stack1(samples) for samples in transposed]
  File "/home/iccd/Documents/SAOT-main/ltr/data/loader.py", line 91, in ltr_collate_stack1
    if torch.utils.data.dataloader.re.search('[SaUO]', elem.dtype.str) is not None:
AttributeError: module 'torch.utils.data.dataloader' has no attribute 're'

Jee-King avatar Apr 30 '22 12:04 Jee-King

hi,thanks for your attention to our work. We have not suffered from this issue. Could you provide more info?

ZikunZhou avatar Apr 30 '22 13:04 ZikunZhou

It may be caused by the environment.

ZikunZhou avatar Apr 30 '22 13:04 ZikunZhou

thank you for your reply! However, I meet another error, and I don't know how to solve it, could you give me some advice?

Restarting training from last epoch ...
No matching checkpoint file found
Training crashed at epoch 1
Traceback for the error!
Traceback (most recent call last):
  File "/home/saot-main/ltr/trainers/base_trainer.py", line 70, in train
    self.train_epoch()
  File "/home/saot-main/ltr/trainers/ltr_trainer.py", line 80, in train_epoch
    self.cycle_dataset(loader)
  File "/home/saot-main/ltr/trainers/ltr_trainer.py", line 61, in cycle_dataset
    loss, stats = self.actor(data)
  File "/home/saot-main/ltr/actors/tracking.py", line 24, in __call__
    target_scores, bboxes, cls = self.net(train_imgs=data['train_images'],
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/saot-main/ltr/models/tracking/dimpnet.py", line 69, in forward
    bboxes, cls = self.state_estimator(train_feat_se, test_feat_se,
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/saot-main/ltr/models/glse/state_estimation.py", line 52, in forward
    modulated_search = self.integrator(templates, subsearch_windows, graph_size)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/saot-main/ltr/models/glse/integration/integration.py", line 123, in forward
    modulated_search = self.fusiongcn(search, processed_xcorr_map, normed_saliency, peak_coords, graph_size)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/saot-main/ltr/models/glse/integration/fusiongcn.py", line 127, in forward
    coords_pair_kpoint, edge_weights = self.gen_kpoint_coords_pair(*graph_size, key_coords, saliency)
  File "/home/saot-main/ltr/models/glse/integration/fusiongcn.py", line 225, in gen_kpoint_coords_pair
    edge_weights[i, unique_key_index] = unique_key_saliency
IndexError: tensors used as indices must be long, byte or bool tensors

Jee-King avatar Jun 20 '22 09:06 Jee-King

and when I replace edge_weights[i, unique_key_index] = unique_key_saliency with edge_weights[i, unique_key_index.long()] = unique_key_saliency, it still not work. A new error is reported as follow,

Restarting training from last epoch ...
No matching checkpoint file found
Training crashed at epoch 1
Traceback for the error!
Traceback (most recent call last):
  File "/home/saot-main/ltr/trainers/base_trainer.py", line 70, in train
    self.train_epoch()
  File "/home/saot-main/ltr/trainers/ltr_trainer.py", line 80, in train_epoch
    self.cycle_dataset(loader)
  File "/home/saot-main/ltr/trainers/ltr_trainer.py", line 61, in cycle_dataset
    loss, stats = self.actor(data)
  File "/home/saot-main/ltr/actors/tracking.py", line 24, in __call__
    target_scores, bboxes, cls = self.net(train_imgs=data['train_images'],
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/saot-main/ltr/models/tracking/dimpnet.py", line 69, in forward
    bboxes, cls = self.state_estimator(train_feat_se, test_feat_se,
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/saot-main/ltr/models/glse/state_estimation.py", line 52, in forward
    modulated_search = self.integrator(templates, subsearch_windows, graph_size)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/saot-main/ltr/models/glse/integration/integration.py", line 123, in forward
    modulated_search = self.fusiongcn(search, processed_xcorr_map, normed_saliency, peak_coords, graph_size)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/saot-main/ltr/models/glse/integration/fusiongcn.py", line 127, in forward
    coords_pair_kpoint, edge_weights = self.gen_kpoint_coords_pair(*graph_size, key_coords, saliency)
  File "/home/saot-main/ltr/models/glse/integration/fusiongcn.py", line 225, in gen_kpoint_coords_pair
    edge_weights[i, unique_key_index.long()] = unique_key_saliency
IndexError: index 328 is out of bounds for dimension 0 with size 324

Jee-King avatar Jun 20 '22 09:06 Jee-King

Hi, I have not met this issue. Have you solved it already? Could you tell me your environment? I'll see if I could reproduce this issue.

ZikunZhou avatar Jun 24 '22 04:06 ZikunZhou

python3.7 cuda11.1 pytorch1.7 也存在这个问题,如他所示IndexError: tensors used as indices must be long, byte or bool tensors

shang153284 avatar Jun 25 '22 07:06 shang153284

I trained SAOT based on cuda11.1 pytorch1.7

Jee-King avatar Jun 25 '22 07:06 Jee-King

hi, our code is tested on cuda10, pytorch1.1. I'll test our code on a higher version pytorch. if i could solve the issue, I'll update our code.

---- Replied Message ---- | From | @.> | | Date | 06/25/2022 15:40 | | To | @.> | | Cc | Zikun @.@.> | | Subject | Re: [ZikunZhou/SAOT] loading dataset (Issue #5) |

I trained SAOT based on cuda11.1 pytorch1.7

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

ZikunZhou avatar Jun 25 '22 14:06 ZikunZhou

I have reproduced this issue in a higher version pytorch, i will try to solve it in this week.

ZikunZhou avatar Jun 26 '22 01:06 ZikunZhou