ExPecto icon indicating copy to clipboard operation
ExPecto copied to clipboard

Sequences < 1,000bp

Open imk1 opened this issue 3 years ago • 9 comments

How does Beluga handle sequences < 1,000bp? Does it center on the input sequence and pad it with N's, or does it do something else? Thanks!

imk1 avatar Jan 14 '22 18:01 imk1

Beluga requires 2kb sequence. Padding with N is not guaranteed to give meaningful results. If your sequence has any flanking sequence in the genomic context, you can add that to both sides.

jzthree avatar Jan 19 '22 22:01 jzthree

I ran Beluga (using this site: https://humanbase.flatironinstitute.org/deepsea/) using sequences < 2kb, and Beluga ran to completion. Do you know how Beluga modified the sequences to convert them into 2kb sequences? Thanks!

imk1 avatar Jan 19 '22 22:01 imk1

Thanks for letting us know. It should actually only allow sequences >2kb - we are looking into this and will update here once it's fixed

jzthree avatar Jan 31 '22 23:01 jzthree

Thanks in advance for keeping me posted!

imk1 avatar Jan 31 '22 23:01 imk1

I was wondering if you have an update on this. Thanks!

imk1 avatar Mar 02 '22 15:03 imk1

Sorry for late update. Currently if the input is smaller than 2kb, it will be padded with "N"s. I don't recommend using fasta input smaller than 2kb unless it is very close to 2kb say only a few bps off. I would recommend adding any flanking sequence to your sequence of interest. We should update the website in terms of input length instructions (Beluga uses 2000bp, Sei uses 4096bp and SeqWeaver uses 1000bp).

jzthree avatar Mar 02 '22 17:03 jzthree

Thanks! If I were to input, say, a 1kb sequence into Beluga, would it get padded with 500 Ns on either side, or would the input be the sequence I inputted followed by 1,000 Ns? Thanks!

imk1 avatar Mar 02 '22 18:03 imk1

It will be padded with 500 Ns on either side. How the Ns will affect the model prediction is largely tested and thus not recommended (in the training the Ns will only appear in assembly gaps and are very rare)

jzthree avatar Mar 02 '22 18:03 jzthree

That makes sense. Thank you!

imk1 avatar Mar 02 '22 18:03 imk1