nextclade icon indicating copy to clipboard operation
nextclade copied to clipboard

feat(web): add global limit on total number of displayed markers

Open ivan-aksamentov opened this issue 2 years ago • 4 comments

We currently have a per-sequence limit for number of displayed sequence view markers. This works well if one or a few among sequences have many markers. Users can tune the threshold.

However, if different sequences have very different number of markers (it's not uncommon for a large virus that number of mutations can differ 10- or 100-fold across the batch of sequences), then it's difficult to find a middle ground.

If a limit was set too high browser can hang during rendering or kill the tab.

Here introduce a global threshold on markers - which is a sum of numbers from all sequences. Upon crossing the threshold no markers are display for any of the sequences. This is a more reliable safeguard against hangups and crashes.

Additionally, the word "Settings" is now clickable and opens settings dialog.

The user experience is however is not optimal. Consider this Monkeypox example:

01

2 of the sequences have 20 time more markers than the average of the rest of sequences.

It would be nice to figure out how to selectively enable the "safe" sequences and disable the "unsafe" ones.

ivan-aksamentov avatar Jun 15 '22 07:06 ivan-aksamentov

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
nextclade ✅ Ready (Inspect) Visit Preview Jun 28, 2022 at 5:34AM (UTC)

vercel[bot] avatar Jun 15 '22 07:06 vercel[bot]

It's OK as a safety latch mechanism but we really should display all sequences up to the point where the number crosses the threshold. Maybe we can keep track of all the marker counts in all sequences. Based on that marker count list, each sequence can locally decide whether it should display itself or not.

Example:

global counts at some point in time are: [100,1000,10000,4000] (order doesn't matter)
maxCount = 5000 (in Settings)
nucmarker count of this sequence: 1000

Function to determine if this sequence should be displayed or not:

1. sort list -> [100,1000,4000,10000]
2. accumulate into total count up to index: [100,1100,5100,15100]
3. Find index that is still below maxCount -> index 1 (1100)
4. Look up count in sorted list that's at index 1 -> 1000
5. Show this sequence if its nucmarker count is smaller or equal to result of (4) -> true

Alternative if nucmarker count of this sequence is 4000:
5. 4000 > 1000 -> false, don't show

corneliusroemer avatar Jun 16 '22 16:06 corneliusroemer

when calculating the total number of markers. does a range of missing markers count as the length of the range or as a single markers?

rneher avatar Jun 25 '22 12:06 rneher

the count doesn't seem to change when one switches some markers off.

rneher avatar Jun 25 '22 12:06 rneher