polyfun
polyfun copied to clipboard
`true_divide` errors with `--compute-h2-bins` when re-estimating per-SNP heritabilities
Overview
We have processed the GWAS summary statistics for many dozens of traits using a pipeline that performs functionally-informed fine-mapping with PolyFun. Overall this has been very successful (thanks for creating and maintaining such great open source code!). However, a few traits have failed, and when this happens, it is typically caused by a true_divide
error in Step 4 to re-estimate per-SNP heritabilities via S-LDSC (--compute-h2-bins
), i.e.
FloatingPointError: invalid value encountered in true_divide
I haven't been able to figure out the exact source of the issue. When I use pdb.set_trace()
to interactively debug, I can confirm that the data at that point is problematic (lots of zeros!). But from following the traceback, I haven't been able to figure out the upstream cause of the problem. I've also searched the input files for potentially problematic entries (NA
, NaN
, Inf
, -Inf
), but I haven't found any. Making it worse it that the true_divide
error gets triggered in different parts of the code base, so it may not even be one specific problem but multiple related problems.
Some more context on our pipeline. We use PolyFun approach 3 to compute prior causal probabilities non-parametrically. We use the baseline UKBB annotations you provide plus some custom annotations. Furthermore, we use DENTIST to remove any SNPs with large LD mismatches between the summary statistics and the UKBB LD reference panel (as we discussed in #115). Based on some experiments I have performed, it appears that the SNPs removed by DENTIST are potentially causing the problem (more details below in the section with the reproducible example).
Tracebacks
I can't share all of the data, so here are the two tracebacks that have failed with true_divide
errors at step 4. I've seen each of these trackbacks twice (ie 2 different traits each have failed with the 2 errors below):
Traceback (most recent call last):
File "/path/to/polyfun/polyfun.py", line 848, in <module>
polyfun_obj.polyfun_main(args)
File "/path/to/polyfun/polyfun.py", line 779, in polyfun_main
self.compute_h2_bins(args, constrain_range=True)
File "/path/to/polyfun/polyfun.py", line 756, in compute_h2_bins
self.run_ldsc(args, use_ridge=False, nn=True, evenodd_split=True, keep_large=False)
File "/path/to/polyfun/polyfun.py", line 217, in run_ldsc
hsqhat = regressions.Hsq(chisq,
File "/path/to/polyfun/ldsc_polyfun/regressions.py", line 401, in __init__
LD_Score_Regression.__init__(self, y, x, w, N, M, n_blocks, intercept=intercept,
File "/path/to/polyfun/ldsc_polyfun/regressions.py", line 273, in __init__
self._prop(jknife, M, Nbar, self.cat, self.tot)
File "/path/to/polyfun/ldsc_polyfun/regressions.py", line 347, in _prop
cat / tot, numer_delete_vals, denom_delete_vals)
FloatingPointError: invalid value encountered in true_divide
Traceback (most recent call last):
File "/path/to/polyfun/polyfun.py", line 848, in <module>
polyfun_obj.polyfun_main(args)
File "/path/to/polyfun/polyfun.py", line 779, in polyfun_main
self.compute_h2_bins(args, constrain_range=True)
File "/path/to/polyfun/polyfun.py", line 756, in compute_h2_bins
self.run_ldsc(args, use_ridge=False, nn=True, evenodd_split=True, keep_large=False)
File "/path/to/polyfun/polyfun.py", line 217, in run_ldsc
hsqhat = regressions.Hsq(chisq,
File "/path/to/polyfun/ldsc_polyfun/regressions.py", line 401, in __init__
LD_Score_Regression.__init__(self, y, x, w, N, M, n_blocks, intercept=intercept,
File "/path/to/polyfun/ldsc_polyfun/regressions.py", line 273, in __init__
self._prop(jknife, M, Nbar, self.cat, self.tot)
File "/path/to/polyfun/ldsc_polyfun/regressions.py", line 346, in _prop
prop = jk.RatioJackknife(
File "/path/to/polyfun/ldsc_polyfun/jackknife.py", line 530, in __init__
self.pseudovalues = self.delete_values_to_pseudovalues(self.est,
File "/path/to/polyfun/ldsc_polyfun/jackknife.py", line 564, in delete_values_to_pseudovalues
(n_blocks - 1) * numer[j, ...] / denom[j, ...]
FloatingPointError: invalid value encountered in true_divide
Reproducible example
As I hinted to above, I think that the SNPs removed by DENTIST may be causing the problem. For one of our traits, DENTIST only removed a small percentage of SNPs, but this was sufficient to trigger the true_divide error. Unfortunately I am unable to share that data set with you.
However, I was able to put together a reproducible example using the UC data from De Lange et al. 2017. For simplicity, I only use the baseline annotations. The PolyFun steps complete successfully for the full summary statistics, but fail with a true_divide error (the first one listed in the section above) for the DENTIST-filtered summary statistics.
I'm going to email you with the two summary statistics file, a script to run the PolyFun steps, and a conda lock file so that you can recreate the exact same conda environment that I used. Any advice you can provide would be greatly appreciated!