outbreak.info
outbreak.info copied to clipboard
Refactor API calls within LineageComparisonComponent
Right now, the API calls on LineageComparisonComponent (outbreak.info/compare-lineages) are very large, moving large amounts of data which is slow. As a result, our API backend often crashes when there are too many requests to this endpoint, as large amounts of data get shuttled around.
To create the heatmap on the page, the function getLineagesComparison
calls getCharacteristicMutations(apiurl, lineage, 0, true, includeSublineages)
, which gives all mutations within a lineage, and then filters it to any mutation which appears in the lineage at a prevalence greater than the frequency
threshold (default = 0.75). This step is necessary, because if you set frequency = 0.75
, you would be missing data for mutations which exist in the lineage below the threshold:
Incorrect: missing cells for B.1.427 x A67V, B.1.427 x DEL69/70, B.1.427 x T95O, etc., which implies those mutations have not been found in the lineage, as opposed to "have been found, but at low prevalence":
Correct but super slow, since the frequency=0
query is HUGE.
To improve this, we could first get all the mutations which exist in the lineages above that threshold, then calculate the mutation prevalence in each lineage.
For instance, BA.3 and B.1.427 Comparison page:
- The initial API call should identify the mutations which occur in either of those lineages (BA.3 or B.1.427) at 75% or greater. This should identify the following set of mutations for each, just looking at
gene == "S"
:
BA.3: ['s:g142d', 's:n211i', 's:d614g', 's:h655y', 's:n679k', 's:a67v', 's:del69/70', 's:n969k', 's:q954h', 's:d796y', 's:p681h', 's:del143/145', 's:del212/212', 's:t95i', 's:n764k'],
B.1.427: ['s:d614g', 's:l452r', 's:s13i', 's:w152c']
- Then, you can call https://api.outbreak.info/genomics/mutations-by-lineage with
mutations
as each of the mutations andpango_lineage
as each of the lineages. (e.g. https://api.outbreak.info/genomics/mutations-by-lineage?mutations=S:A67V&pangolin_lineage=BA.3). You can combine mutations byAND
to loop over each of them simultaneously -- however, mutations that don't exist within the lineage (like S:S13I in BA.3) will cause the entire API call to fail with a status code of 500.
First steps:
- Profile if this approach would actually improve speed for a realistic set of lineages (for instance, the default set of lineages on outbreak.info/compare-lineages)
- If so, implement it in the front-end.
- Alternative approaches are welcome too.