Towards-Automatic-Speech-to-SL icon indicating copy to clipboard operation
Towards-Automatic-Speech-to-SL copied to clipboard

Dimension Error

Open nstalways opened this issue 2 years ago • 0 comments

Hello, author.

Nowadays I'm trying to train a speech2sign model based on your official code.

But I faced on dimension errors when I saved my own audios, kpts and texts.

Here's an error messages which I faced on.

Traceback (most recent call last):
  File "__main__.py", line 36, in <module>
    main()
  File "__main__.py", line 28, in main
    train(cfg_file=args.config_path)
  File "/home/suyeong/Towards-Automatic-Speech-to-SL/training.py", line 690, in train
    trainer.train_and_validate(train_data=train_data, valid_data=dev_data)
  File "/home/suyeong/Towards-Automatic-Speech-to-SL/training.py", line 329, in train_and_validate
    batch = Batch(torch_batch=batch,
  File "/home/suyeong/Towards-Automatic-Speech-to-SL/batch.py", line 78, in __init__
    self.trg_input = trg.clone()[:, :-1, :] # original code
IndexError: too many indices for tensor of dimension 2

In my opinion, the error caused by data.py -> class SignProdDataset.

            examples.append(data.Example.fromlist(
                [src[:], trg[:num_sec*trg_fps], nonreg_trg_line, file_paths], fields))
            num_vids+=1

So trg's shape is (,num_sec*trg_fps), it just have a one dimension, right?

But in batch.py, code needs three dimensions, (batch, joints, frames).

When I print trg.shape in batch.py, I get an shape like (batch, 150).

The question is, which code should be modified?

Thank you for your commitment.

nstalways avatar Aug 03 '22 09:08 nstalways