plink-ng
plink-ng copied to clipboard
Logging bug in bmerge
See two small datasets attached. I'm running merge-mode 6 and would thus expect the ordering of the datasets not to matter but I get different results as highlighted in bold
PLINK v1.90b6.17 64-bit (28 Apr 2020) www.cog-genomics.org/plink/1.9/ (C) 2005-2020 Shaun Purcell, Christopher Chang GNU General Public License v3 Logging to PlusFirst.log. Options in effect: --bfile Plus --bmerge Normed.bed Normed.bim Normed.fam --merge-mode 6 --out PlusFirst 16340 MB RAM detected; reserving 8170 MB for main workspace. 4 people loaded from Plus.fam. 4 people to be merged from Normed.fam. Of these, 0 are new, while 4 are present in the base dataset. Warning: Multiple positions seen for variant 'MNV'. Warning: Multiple chromosomes seen for variant 'PAR-X'. 30 markers loaded from Plus.bim. 29 markers to be merged from Normed.bim. Of these, 0 are new, while 29 are present in the base dataset. Warning: Variants '0monomorphic' and '00missing' have the same position. Warning: Variants '2:103037578' and '0monomorphic' have the same position. Warning: Variants '2:103037578:G:T' and '2:103037578' have the same position. 24 more same-position warnings: see log file. Performing 1-pass diff (mode 6), writing results to PlusFirst.diff . 116 overlapping calls, 84 nonmissing in both filesets. 76 concordant, for a concordance rate of 0.904762.
PLINK v1.90b6.17 64-bit (28 Apr 2020) www.cog-genomics.org/plink/1.9/ (C) 2005-2020 Shaun Purcell, Christopher Chang GNU General Public License v3 Logging to NormedFirst.log. Options in effect: --bfile Normed --bmerge Plus.bed Plus.bim Plus.fam --merge-mode 6 --out NormedFirst 16340 MB RAM detected; reserving 8170 MB for main workspace. 4 people loaded from Normed.fam. 4 people to be merged from Plus.fam. Of these, 0 are new, while 4 are present in the base dataset. Warning: Multiple positions seen for variant 'MNV'. Warning: Multiple chromosomes seen for variant 'PAR-X'. 27 markers loaded from Normed.bim. 30 markers to be merged from Plus.bim. Of these, 3 are new, while 27 are present in the base dataset. Warning: Variants '0monomorphic' and '00missing' have the same position. Warning: Variants '2:103037578' and '0monomorphic' have the same position. Warning: Variants '2:103037578:G:T' and '2:103037578' have the same position. 23 more same-position warnings: see log file. Performing 1-pass diff (mode 6), writing results to NormedFirst.diff . 108 overlapping calls, 78 nonmissing in both filesets. 70 concordant, for a concordance rate of 0.897436.
Thanks for reporting this.
--bmerge is not symmetric; the first appearance of a variant has priority when there are duplicates, and I can't change this behavior without breaking backward compatibility. With that said, if you use PLINK 1.07 to execute the second command, it actually errors out instead of just reporting a different result due to the totally-duplicate .bim entries in Normed.bim. I will think about making PLINK 1.9 error out as well in the second case.
Thanks for looking into this. I thought my data might have some weird edge case but I hadn't noticed the duplicate identifiers in Normed.bim.