What is the "-m" option doing
According to the manual, the "-m" option controls the minimum length allowed for the predicted transcripts. So, I expect that the output gtf file will just contain transcripts longer than the threshold.
I used "-m 200" for one sample, and then I tried to calculate transcript length from the gtf file by using the following R code:
gtf<- fread("transcripts.gtf")
gtf$length <- gtf$V5 - gtf$V4 + 1
summary(gtf[gtf$V3=="transcript",]$length)
Suprisingly, I got the following results.
Min. 1st Qu. Median Mean 3rd Qu. Max.
34 256 354 2831 1209 674207
So, what does the "-m" option mean acutally?
The -m parameter only controls the length of novel assembled transcripts. If you used the -G parameter, all the transcripts in the reference annotation will be considered as well, no matter what their length. I suggest cleaning that reference file of the transcripts you are not interested in before giving it to StringTie.
The -m parameter only controls the length of novel assembled transcripts. If you used the -G parameter, all the transcripts in the reference annotation will be considered as well, no matter what their length. I suggest cleaning that reference file of the transcripts you are not interested in before giving it to StringTie.
Thank you so much! Now I got it! Is that the same for the "--merge" mode? The -m parameter controls the minimum input length of novel transcripts?