goci icon indicating copy to clipboard operation
goci copied to clipboard

Associations download has missing rows

Open ljwh2 opened this issue 10 months ago • 1 comments

A user reported the associations download file has missing rows since the latest DR. I verified the number of rows in the file is 508,403 but the number of associations shown in the UI is 605,831

From the user:

I notice that gwas_catalog_v1.0.2-associations_e111_r2024-04-16.tsv is much smaller than the earlier file. It is missing some 80,000 rows, including the entire Zhu et al study https://www.ebi.ac.uk/gwas/publications/34989438 Since that study is still available through the UI I assume there is something wrong with the download file.

ljwh2 avatar Apr 17 '24 12:04 ljwh2

This was caused by a java.lang.NumberFormatException when GWAS UI downloads API is trying to convert the distance field from string to number, the value was "--21154" which cannot be converted to a number. it was because some values of the DISTANCE field in the GENOMIC_CONTEXT table were negative, they shouldnt have been as there are two other fields (UPSTREAM and DOWNSTREAM) that help determine the direction, we retrieve this from ENSEMBL The SOLR value is constructed by the indexer, by appending a "-" to the DISTANCE, depending of the values of UPSTREAM and DOWNSTREAM. So DISTANCE should never be negative, which makes sense We reran the mapping pipeline which removed all negative values, so next DR should fix any affected files, but we still havent figured out why it happened, if it was a mapping pipeline or an ENSEMBL issue

ala-ebi avatar Apr 18 '24 13:04 ala-ebi