plink-ng
plink-ng copied to clipboard
Possible to encode a bed file with no variants?
I have a quite large dataset that I have manually split into ~1000 bcf files. When I try to select (for example) a subset of the data that is above a given allele frequency I end up with some of those bcf files having 0 variants. This is no problem for the vcf/bcf format, but if I then try to convert the files to plink, I get Error: No variants in .bcf file
.
From reading the bim/bed spec, it seems like the spec should permit a file with no variants (i.e an empty bim
file, and a bed
file with only the three magic bytes)
It looks like --allow-no-vars
was retired from plink2 with this commit from 2018. Would you be open to a pull request restoring this functionality?
In principle, yes, but realistically you are better off saving yourself several months of work and instead checking for plink2 error code 13 ("DegenerateData").
I am open to a much-more-manageable pull request that tweaks the VCF/BCF(/other?) import functions so that they return that error code on an empty input file; I'll try to take care of that myself this weekend.
I'm not too familiar with plink/plink2 internals but it was my understanding that sample size is computed from the fam file (or equivalent), and the number of variants is computed from the bim file (or equivalent). If anything, it seems like it would take extra work write code that doesn't work in the case where there are no variants.
I can check for error code 13, but then what do I do?
The issue is that 0 variants is an uninteresting edge case that nevertheless would need to be handled correctly by every single plink2 function when —allow-no-vars is reintroduced. Development of features that provide significant immediate value is simpler and faster when I don’t need to worry about this. I intend to get around to backfilling —allow-no-vars before I “retire” from plink, but I’ve mentally budgeted more than a month for it.
You can handle exit code 13 by removing that empty shard from subsequent pipeline steps. (Note that you might still need to do that in a world where —allow-no-vars exists; it does exist in plink 1.9.)
On Wed, Jul 14, 2021 at 12:49 PM Nicholas Knoblauch < @.***> wrote:
I'm not too familiar with plink/plink2 internals but it was my understanding that sample size is computed from the fam file (or equivalent), and the number of variants is computed from the bim file (or equivalent). If anything, it seems like it would take extra work write code that doesn't work in the case where there are no variants.
I can check for error code 13, but then what do I do?
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/chrchang/plink-ng/issues/186#issuecomment-880163709, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA7OQGBIMYMSYJOBKK4MALTXXS5LANCNFSM5ALUZIOA .