rvtests icon indicating copy to clipboard operation
rvtests copied to clipboard

unbelievable slow speed for --siteFile

Open jielab opened this issue 7 years ago • 4 comments

Please see my comment on #25, it took 43356 seconds to run a regression on 93 SNPs when I used --siteFile. But when i use bcftools first to extract those 93 SNPs to create a new VCF, which takes a minute, then it only took 109 second to run the same analysis

So, i think there is something VERY WRONG with this --siteFile option. Just want to point this out so that others don't run into the same issue.

best regards, Jie

jielab avatar Apr 30 '17 05:04 jielab

I'm working on that. That does seem to unreasonably slow. I will follow up on this.

Xiaowei

On Apr 30, 2017, at 12:22 AM, jiehuang001 [email protected] wrote:

Please see my comment on #25, it took 43356 seconds to run a regression on 93 SNPs when I used --siteFile. But when i use bcftools first to extract those 93 SNPs to create a new VCF, which takes a minute, then it only took 109 second to run the same analysis

So, i think there is something VERY WRONG with this --siteFile option. Just want to point this out so that others don't run into the same issue.

best regards, Jie

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

zhanxw avatar May 01 '17 14:05 zhanxw

Can you please remind me the number of sites specified by --siteFile option? Just want to confirm that with you, as I guess that very large amount of sites slows down the analysis,

Thanks.

zhanxw avatar May 02 '17 20:05 zhanxw

93 sites

From: zhanxw [mailto:[email protected]] Sent: 2017年5月2日 16:10 To: zhanxw/rvtests [email protected] Cc: jiehuang001 [email protected]; Author [email protected] Subject: Re: [zhanxw/rvtests] unbelievable slow speed for --siteFile (#26)

Can you please remind me the number of sites specified by --siteFile option? Just want to confirm that with you, as I guess that very large amount of sites slows down the analysis,

Thanks.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/zhanxw/rvtests/issues/26#issuecomment-298746704 , or mute the thread https://github.com/notifications/unsubscribe-auth/AZsvf42zgFuaoTJS_xlhSx_Dtt08aW8Hks5r142LgaJpZM4NMePP . https://github.com/notifications/beacon/AZsvf_Bazhd1zbDBnbUSSKsLmbdHu33-ks5r142LgaJpZM4NMePP.gif

jielab avatar May 02 '17 20:05 jielab

@jiehuang001 I have optimized --siteFile option to improve speed. However, you may consider using --rangeFile instead.

Since you have a small amount of variants (93 variants) to analyze, I would recommend to use --rangeFile. This option will let RVTESTS utilize the VCF index file, make RVTESTS only read in these variants and analyze them.

When you have lots of variants, --siteFile is more appropriate, as RVTESTS will read in every variant, but only analyze the variants specified in --siteFile.

zhanxw avatar Jun 15 '17 22:06 zhanxw