rvtests icon indicating copy to clipboard operation
rvtests copied to clipboard

Is using setFile slower than geneFile?

Open zx8754 opened this issue 5 years ago • 3 comments

Sorry, didn't test it thoroughly , but it just "feels" slower, maybe you know the reason? (If not I can create reproducible example.)

I tried standard file as input: --geneFile refFlat_hg19.txt.gz

Then, I created subset of above file with custom filters. Now, using setFile with my custom input set file, instead of geneFile seems slower: --setFile refFlat_hg19_customFilter.txt

Is this expected?

zx8754 avatar Nov 13 '18 19:11 zx8754

That depends on the content of the set file. Internally, the option --setFile will use the index file to read each variant specified, and the option --geneFile will use the index file to locate the gene regions and then process each variant. In your case, maybe --setFile has lots of variants. Since each variant will be look up, the total computation time can be longer than the --geneFile.

zhanxw avatar Dec 06 '18 06:12 zhanxw

To clarify, --setFile refFlat_hg19_customFilter.txt is just a subset of refFlat_hg19.txt.gz file. There are no variants, just gene start stop, e.g.:

A1BG	19:58858171-58864865	chr19	58858171	58864865
A1CF	10:52559168-52645435	chr10	52559168	52645435

zx8754 avatar Dec 06 '18 07:12 zx8754

Thanks. In this case, I don't expect --setFile is much slower than --geneFile.

zhanxw avatar Dec 06 '18 23:12 zhanxw