consistency icon indicating copy to clipboard operation
consistency copied to clipboard

Issue with training baseline model

Open chaitanyamalaviya opened this issue 3 years ago • 1 comments

Hi,

I followed all the preprocessing steps and installed the required packages. However, I am facing an error in training a baseline model with this code. I am using exactly the same command as here. I believe it has something to do with the use of extra_train_data. Would be helpful if you had any suggestion for how to resolve this.

Grad overflow on iteration 0
Using dynamic loss scale of 65536
Traceback (most recent call last):
  File "train.py", line 579, in <module>
    sys.exit(main(sys.argv[1:]))
  File "train.py", line 574, in main
    train(opt, shared, m, optim, train_data, val_data, extra_train, extra_val, unlabeled)
  File "train.py", line 410, in train
    train_perf, extra_train_perf, loss, num_ex = train_epoch(opt, shared, m, optim, train_data, i, train_idx, extra, extra_idx, unlabeled, unlabeled_idx)
  File "train.py", line 216, in train_epoch
    batch_ex_idx, batch_l, source_l, target_l, label, res_map) = data[batch_order[i]]
  File "/net/nfs.corp/alexandria/chaitanyam/consistency/data.py", line 264, in __getitem__
    batch_l, source_l, target_l, label) = self.batches[idx]
IndexError: list index out of range

Thanks for your help!

chaitanyamalaviya avatar Mar 09 '21 21:03 chaitanyamalaviya

Hey, sorry for the late response. The issues panel is not actively monitored. For any further question, please email me directly.

It seems the issue was due to a later commit that was supposed to fix a similar issue. I have just reverted the change and did a quick run from scratch. It should be good now.

t-li avatar Jun 13 '21 09:06 t-li