plink-ng
plink-ng copied to clipboard
`Error: Non-concatenating --pmerge[-list] is under development.`
Hi,
running Plink (v2.00a4LM AVX2 Intel) errors out when merging multiple datasets.
$ ../plink2 --debug --memory 8000 --threads 6 --pmerge-list input_sources.txt --out merged
PLINK v2.00a4LM AVX2 Intel (9 Jan 2023) www.cog-genomics.org/plink/2.0/
(C) 2005-2023 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to merged.log.
Options in effect:
--debug
--memory 8000
--out merged
--pmerge-list input_sources.txt
--threads 6
Start time: Mon Jan 23 17:06:52 2023
385417 MiB RAM detected; reserving 8000 MiB for main workspace.
Using up to 6 compute threads.
--pmerge-list: 2 filesets specified.
--pmerge-list: 2 samples present.
--pmerge-list: Merged .psam written to merged.psam .
--pmerge-list: 2 .pvar files scanned, headers merged.
Error: Non-concatenating --pmerge[-list] is under development.
Contents of input_sources.txt
:
$ cat input_sources.txt
test3
test4
test3
and test4
have been generated from VCF files:
$ plink2 --vcf ../3.vcf.gz --out test3 --make-pgen
$ plink2 --vcf ../4.vcf.gz --out test4 --make-pgen
I'm a newbie with Plink and suspect I'm doing something wrong but after some digging I've found no clue.
System specs: CentOS 7.9, Intel(R) Xeon(R) Silver 4210R
The error message means exactly what it says: this feature isn't implemented in plink2 yet. ("Concatenating" merge refers to the "bcftools concat" use case, though plink2's behavior differs a bit from bcftools's here.) Use e.g. bcftools or plink 1.9 to merge for now.
The error message means exactly what it says: this feature isn't implemented in plink2 yet. ("Concatenating" merge refers to the "bcftools concat" use case, though plink2's behavior differs a bit from bcftools's here.) Use e.g. bcftools or plink 1.9 to merge for now.
Are you sure? as of march 13th, we were able to use plink2 to concat data sets.
Here is a log of a working example:
PLINK v2.00a3.7LM AVX2 Intel (24 Oct 2022)
Options in effect:
--out ukb24068_c5_merged_sample_filtered
--pfile ukb24068_c5_b1_merged_sample_filtered
--pmerge-list chr5_list
Hostname: 80b217465abd
Working directory: /home/ubuntu/exome_pgen
Start time: Mon Mar 13 15:49:53 2023
Random number seed: 1678722593
63628 MiB RAM detected; reserving 31814 MiB for main workspace.
Using up to 16 threads (change this with --threads).
--pmerge-list: 19 filesets specified (including main fileset).
--pmerge-list: 422625 samples present.
--pmerge-list: Merged .psam written to ukb24068_c5_merged_sample_filtered.psam
.
--pmerge-list: 19 .pvar files scanned, headers merged.
Concatenation job detected.
Concatenating... 747813/747813 variants complete.
Results written to ukb24068_c5_merged_sample_filtered.pgen +
ukb24068_c5_merged_sample_filtered.pvar .
End time: Mon Mar 13 15:51:11 2023
However, we see this same error for 2 of our chromosomes, not sure why yet. Same code is run in a loop, the pvar
and psam
files are made, but the pgen
file is not produced. Any ideas?
PLINK v2.00a3.7LM AVX2 Intel (24 Oct 2022)
Options in effect:
--out ukb24068_c8_merged_sample_filtered
--pfile ukb24068_c8_b1_merged_sample_filtered
--pmerge-list chr8_list
Hostname: 80b217465abd
Working directory: /home/ubuntu/exome_pgen
Start time: Mon Mar 13 15:54:05 2023
Random number seed: 1678722845
63628 MiB RAM detected; reserving 31814 MiB for main workspace.
Using up to 16 threads (change this with --threads).
--pmerge-list: 15 filesets specified (including main fileset).
--pmerge-list: 422625 samples present.
--pmerge-list: Merged .psam written to ukb24068_c8_merged_sample_filtered.psam
.
--pmerge-list: 15 .pvar files scanned, headers merged.
Error: Non-concatenating --pmerge-list is under development.
End time: Mon Mar 13 15:54:10 2023
@gulumk for visibility
When two variants share a position, --pmerge-list uses the --sort-vars setting (https://www.cog-genomics.org/plink/2.0/data#sort_vars ) to determine their output order. In particular, if the end of one .pvar and the beginning of the next have variants at the same position, and their IDs are in the wrong order, --pmerge-list can no longer "concatenate".
I will update the online documentation today to spell this out.
When two variants share a position, --pmerge-list uses the --sort-vars setting (https://www.cog-genomics.org/plink/2.0/data#sort_vars ) to determine their output order. In particular, if the end of one .pvar and the beginning of the next have variants at the same position, and their IDs are in the wrong order, --pmerge-list can no longer "concatenate".
I will update the online documentation today to spell this out.
I see, thank you for the quick reply. Would you say that inspecting the heads and tails of the pvar
files is a good place to start? Is this issue strictly due to the pvar
file or could issues in the pgen
file throw this error as well?
- Yes; if you don't want to resort to exporting to BCF and using "bcftools concat", one option is temporarily editing the offending leading/trailing variant IDs so that they no longer violate --sort-vars order.
- No, pgen file contents can't cause this.
Thanks @chrchang , we were able to resolve our issue
@myz540 Hi Mike, would you mind providing me with your codes to address this issue since I got the same issue as yours? I really look forward to receiving your help.
Are there any updates on this?
@myz540 Hi Mike, would you mind providing me with your codes to address this issue since I got the same issue as yours? I really look forward to receiving your help.
Hey @123huynguyen, I would love to help but this was at an old job so I no longer have access to the code base or the context required to provide you a solution. I believe the issue was in the sorting, when we inspected the pvar
file head
and tail
, we saw that the chunks weren't sorted correctly. I can't be 100% that was the issue given how long it's been but hope this helps