plink-ng icon indicating copy to clipboard operation
plink-ng copied to clipboard

[Feature request] score many set of variants in a single pass

Open wavefancy opened this issue 5 years ago • 2 comments

Dear Christopher,

Thanks much for developing and maintaining this great software. Can I ask for a new feature for the --score function.

Now it supports in a single pass score the same set of variants with different weights, each weight in a single column. However, can you please support score different sets of variants with different weights in a single pass? Probably, we can make a superset of variants and set some variants with 0 weight to have the same results in the current setting. However, if we want a gene-specific score for every gene, then the file will end up have more than 20000+ columns with many cells as zero. It looks very inefficient for computing. I may suggest a file with 20000+ rows, each row is a gene and list of variants with its weight associated, then the scoring only needs to load the necessary variants.

I think this feature would be very helpful for the rare variant association test, as you know, the burden score of each gene/or region is actually the sum of weighted scores. But calculate the score are very time consuming for many software, especially based on the input as vcf format. I think leverage on the plink and file format infrastructure, we can have the burden score very fast.

Thank you very much for your help.

Best regards Wallace

wavefancy avatar Nov 09 '20 20:11 wavefancy

Yes, it shouldn't be difficult to add support for a sparse input format to --score; I'll look into implementing this over the next week.

chrchang avatar Nov 09 '20 22:11 chrchang

This is great, thank you very much for the support.

Best regards Wallace

wavefancy avatar Nov 10 '20 08:11 wavefancy

It isn't that much better than a bash for-loop, but --score-list exists, so I'll close this issue.

chrchang avatar Jan 05 '24 08:01 chrchang