Towards-Automatic-Speech-to-SL
Towards-Automatic-Speech-to-SL copied to clipboard
Dimension Error
Hello, author.
Nowadays I'm trying to train a speech2sign model based on your official code.
But I faced on dimension errors when I saved my own audios, kpts and texts.
Here's an error messages which I faced on.
Traceback (most recent call last):
File "__main__.py", line 36, in <module>
main()
File "__main__.py", line 28, in main
train(cfg_file=args.config_path)
File "/home/suyeong/Towards-Automatic-Speech-to-SL/training.py", line 690, in train
trainer.train_and_validate(train_data=train_data, valid_data=dev_data)
File "/home/suyeong/Towards-Automatic-Speech-to-SL/training.py", line 329, in train_and_validate
batch = Batch(torch_batch=batch,
File "/home/suyeong/Towards-Automatic-Speech-to-SL/batch.py", line 78, in __init__
self.trg_input = trg.clone()[:, :-1, :] # original code
IndexError: too many indices for tensor of dimension 2
In my opinion, the error caused by data.py -> class SignProdDataset.
examples.append(data.Example.fromlist(
[src[:], trg[:num_sec*trg_fps], nonreg_trg_line, file_paths], fields))
num_vids+=1
So trg's shape is (,num_sec*trg_fps), it just have a one dimension, right?
But in batch.py, code needs three dimensions, (batch, joints, frames).
When I print trg.shape in batch.py, I get an shape like (batch, 150).
The question is, which code should be modified?
Thank you for your commitment.