METAL icon indicating copy to clipboard operation
METAL copied to clipboard

Heterogeneity analysis fails

Open lindabroer-g opened this issue 4 years ago • 8 comments

During meta-analysis of 18 cohorts the heterogeneity analysis fails with the following error message:

ERROR: Input file has changed since analysis started

When rerunning the same analysis with fewer cohorts it works fine. I do not touch the input files during analysis. They are gzipped and all have the exact same format (processed by EasyQC beforehand). Does anyone know what causes this error and how to fix it?

Thanks in advance

lindabroer-g avatar Jun 16 '20 05:06 lindabroer-g

Seems like that error message occurs when either:

  1. The header row has changed (seems unlikely?)
  2. The number of markers processed in the initial analysis pass does not match the number of markers re-processed during the second scan through the file for calculating heterogeneity statistics

Variants that are being filtered out for QC issues are a possible culprit. In particular variants with ambiguous or missing alleles. The initial processing step seems to go through some effort to try and fix and/or guess in those cases, but the re-processing step might not (?)

If you know that you have a smaller set of cohorts that work, maybe you could try adding cohorts one at a time until you find a problematic one. Then see if that cohort has any warnings about bad variants/alleles/strands that might be fixable.

welchr avatar Jun 16 '20 18:06 welchr

Thanks, I'll look into it and let you know if this fixes the problem.

lindabroer-g avatar Jun 17 '20 05:06 lindabroer-g

Again thanks for the tips. There was something weird going on with the marker positions in one of the cohorts resulting in positions like 1.2e+07. When I fixed this everything worked fine. Though I do use trackposition, not sure why this wouldn't fail in the first processing step. In any case the problem is fixed now.

lindabroer-g avatar Jun 18 '20 05:06 lindabroer-g

Actually I think you found the problem right there. If TRACKPOSITIONS is ON, the initial processing step does some extra checking for bad chromosomes and/or positions:

https://github.com/statgen/METAL/blob/e2253cc3901df8403a331bd725d4d9fe1edfb19f/metal/Main.cpp#L1140-L1159

The continues there cause the rest of the code to be skipped, which means the number of processed markers isn't increased.

However, the re-processing step for heterogeneity analysis does not do this same check. So it will try to analyze more markers than were originally processed.

welchr avatar Jun 18 '20 15:06 welchr

Is anyone working on this? It would be great to have this addressed w/o the user manually fixing data beforehand.

quattro avatar Jun 06 '22 20:06 quattro

I am also getting the same error when I use trackpositions and heterogeneity together. A fix for this would be most welcome!

oalavijeh avatar Jan 18 '23 10:01 oalavijeh

I think this issue is caused by the option GENOMICCONTROL ON. When this option is on, the files will be modified in the first run. While checking for heterogeneity in the second run, the files do not match each other. I tested by turning off the genomic control option, and it worked. So, I guess it is the issue.

yningvu avatar Jan 26 '24 10:01 yningvu

@yningvu I suspect that the GENOMICCONTROL flag is potentially one step above TRACKPOSITIONS? Where if you disable GENOMICCONTROL you may also disable TRACKPOSITIONS?

I say this because I have run two versions of my meta-analysis with METAL recently, one with GENOMICCONTROL ON and without TRACKPOSITIONS (for posterity, this was actually run using the version of METAL you download from the METAL website, https://csg.sph.umich.edu/abecasis/Metal/download/ which is actually a version from 2011...). This version of the meta-analysis worked totally fine along with producing heterogeneity values.

I included the TRACKPOSITIONS flag in my second run of the analysis and it seems like adding that to the same analysis as the first one resulted in the issue mentioned in this thread, because the first run through of this removed a bunch of SNPs with discordant positions from the analysis.

I was mostly including TRACKPOSITIONS just so I could get chromosome + position columns in my output, so it would actually be good if this could be fixed. As it stands I may have to re-run with TRACKPOSITIONS off and manually merge the two outputs.

Sabor117 avatar May 29 '24 16:05 Sabor117