consistency
consistency copied to clipboard
Issue with training baseline model
Hi,
I followed all the preprocessing steps and installed the required packages.
However, I am facing an error in training a baseline model with this code.
I am using exactly the same command as here. I believe it has something to do with the use of extra_train_data
. Would be helpful if you had any suggestion for how to resolve this.
Grad overflow on iteration 0
Using dynamic loss scale of 65536
Traceback (most recent call last):
File "train.py", line 579, in <module>
sys.exit(main(sys.argv[1:]))
File "train.py", line 574, in main
train(opt, shared, m, optim, train_data, val_data, extra_train, extra_val, unlabeled)
File "train.py", line 410, in train
train_perf, extra_train_perf, loss, num_ex = train_epoch(opt, shared, m, optim, train_data, i, train_idx, extra, extra_idx, unlabeled, unlabeled_idx)
File "train.py", line 216, in train_epoch
batch_ex_idx, batch_l, source_l, target_l, label, res_map) = data[batch_order[i]]
File "/net/nfs.corp/alexandria/chaitanyam/consistency/data.py", line 264, in __getitem__
batch_l, source_l, target_l, label) = self.batches[idx]
IndexError: list index out of range
Thanks for your help!
Hey, sorry for the late response. The issues panel is not actively monitored. For any further question, please email me directly.
It seems the issue was due to a later commit that was supposed to fix a similar issue. I have just reverted the change and did a quick run from scratch. It should be good now.