UFold icon indicating copy to clipboard operation
UFold copied to clipboard

Batch with different rna length

Open giorgiobini opened this issue 2 years ago • 2 comments

Hello,

I am wondering if you have any function to pad batches with different size.

Thank you so much in advance!

giorgiobini avatar Sep 20 '22 14:09 giorgiobini

Hi there,

Sorry for that, since our framework could deal with sequence with various length, so to avoid out-of-memory issue, we have limited the batch size and set it to a fixed number. Our training model uses batch size of 1 to deal with all the data. So currently we do not support function to pad batches with different sizes.

Thanks.

sperfu avatar Sep 21 '22 02:09 sperfu

Hi, Regarding to your question on padding batches with different size, I'm afraid we don't have that function. The reason is that different sequence have different length(ranging from 10bp to over a thousand bp). If we pad sequence into the same length, it will inevitably bring useless information, which would deteriorate the performance. So we choose the model batch size 1 with one sequence per input to avoid padding sequence.

Thanks.

sperfu avatar Sep 21 '22 08:09 sperfu