DeepMicrobes icon indicating copy to clipboard operation
DeepMicrobes copied to clipboard

Error occurred during model training.

Open XLOXL opened this issue 2 years ago • 13 comments

Hello, I didn't encounter any errors when training small amounts of data, but when I tried to train several Gs worth of data, I received the following error. Could you please advise me on how to fix it? Error:tensorflow.python.framework.errors_impl.InvalidArgumentError: len(seq_lens) != input.dims(0), (22 vs. 32) [[node token_lstm/bidirectional_rnn/bw/ReverseSequence (defined at /anaconda3/envs/DeepMicrobes-master/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]

XLOXL avatar Jun 20 '23 09:06 XLOXL

Hi, sorry I did not encounter such errors before and have no idea how to fix it.

MicrobeLab avatar Jun 20 '23 10:06 MicrobeLab

Okay, thank you!

XLOXL avatar Jun 20 '23 10:06 XLOXL

Hello, I have another question. The default batch size for training is 32, while the default batch size for prediction is 8192. I'm wondering what the difference is between the two and why the values are so different.

XLOXL avatar Jun 20 '23 10:06 XLOXL

For prediction, the batch size has no effect on the results, so I tend to use the largest batch size that fits into memory. For training, batch size is an important hyper-parameter that users should carefully tune.

MicrobeLab avatar Jun 20 '23 10:06 MicrobeLab

Okay, thank you!

XLOXL avatar Jun 20 '23 10:06 XLOXL

Hello, have you encountered a similar error during prediction before? I have modified my num_classes, vocab_size, and k-mer. Error:Assign requires shapes of both tensors to match. lhs shape= [2505] rhs shape= [1] [[node save/Assign_7 (defined at /anaconda3/envs/DeepMicrobes-master/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]

XLOXL avatar Jun 20 '23 11:06 XLOXL

Hi, I have not encountered a similar error. It seems that the number of classes was left as the default 2505. Not sure why.

MicrobeLab avatar Jun 20 '23 11:06 MicrobeLab

Okay, thank you very much for your patient response.

XLOXL avatar Jun 20 '23 11:06 XLOXL

Hello, We have a GPU instance with the computational requirements mentioned in the paper. When we are training the model for one reference what should the num_classes parameter be? We keep getting a "InvalidArgumentError:Assign requires shapes of both tensors to match. lhs shape=[3000,1] rhs shape=[3000,2505]". By default it is taking the total species count as num_classes. Can you also let us know what the '3000' number is?

Just to be sure, num_classes parameter is the to tell what is the final number of predicted classes right? So if we are training with 1 reference it should be 1 or 120? We did not encounter this error earlier when training was been done on a non-GPU instance.

DeepMicrobes.py --helpfull command is not working for us. Can you point us to an alternative or maybe just give the output of it so we can see all parameters?

Thanks.

Gayathri142 avatar Sep 06 '24 04:09 Gayathri142

Yes, num_classes parameter is the final number of predicted classes. The default value is 2505 and should be changed to exact number of output nodes. The '3000' is just a hyper-parameter of model architecture and is not related to data. If training on data with 120 possible classes, it should be 120. You may directly refer to the code of argparse in the script to see options.

MicrobeLab avatar Sep 06 '24 04:09 MicrobeLab

Thank you for your reply. Where should it be changed? Should it be changed internally in the scripts or as a parameter? We are still getting errors when we just change the 'num_classes=1' when we train on 1 reference. if it has to be changed internally, can you point us to the scripts that need to be changed?

Gayathri142 avatar Sep 06 '24 05:09 Gayathri142

It is set as a parameter as --num_classes=120, not internally. By the way, not sure the meaning of classification when there is only one possible category.

MicrobeLab avatar Sep 06 '24 06:09 MicrobeLab