goci
goci copied to clipboard
Associations download has missing rows
A user reported the associations download file has missing rows since the latest DR. I verified the number of rows in the file is 508,403 but the number of associations shown in the UI is 605,831
From the user:
I notice that gwas_catalog_v1.0.2-associations_e111_r2024-04-16.tsv is much smaller than the earlier file. It is missing some 80,000 rows, including the entire Zhu et al study https://www.ebi.ac.uk/gwas/publications/34989438 Since that study is still available through the UI I assume there is something wrong with the download file.
This was caused by a java.lang.NumberFormatException when GWAS UI downloads API is trying to convert the distance field from string to number, the value was "--21154" which cannot be converted to a number. it was because some values of the DISTANCE field in the GENOMIC_CONTEXT table were negative, they shouldnt have been as there are two other fields (UPSTREAM and DOWNSTREAM) that help determine the direction, we retrieve this from ENSEMBL The SOLR value is constructed by the indexer, by appending a "-" to the DISTANCE, depending of the values of UPSTREAM and DOWNSTREAM. So DISTANCE should never be negative, which makes sense We reran the mapping pipeline which removed all negative values, so next DR should fix any affected files, but we still havent figured out why it happened, if it was a mapping pipeline or an ENSEMBL issue