webknossos icon indicating copy to clipboard operation
webknossos copied to clipboard

In Proofreading, Load Oversegmentation, Perform Merges Eagerly in Frontend

Open fm3 opened this issue 11 months ago • 5 comments

This PR adds support for a locally applied HDF5 mapping (instead of applying it by the back-end). This enables the user to do interactive proofreading where (a) super-voxels can be easily identified by hovering and (b) merge operations can be done without a round-trip to the back-end and (c) split operations can be done without reloading all buckets (but a roundtrip is necessary to find out which edges to delete).

The HDF5 mapping is only applied locally when the annotation has an editable mapping or is about to have one (i.e., the user is in proofreading mode). When this condition changes, the layer is reloaded automatically to swap the responsibility of mapping the segmentation between client and server.

URL of deployed dev instance (used for testing):

  • https://___.webknossos.xyz

Steps to test:

  • create a new annotation for a dataset with a segmentation layer that has a hdf5 mapping
  • enable the proofreading tool
  • hovering over the active segment should highlight super voxels
  • merge operations should feel very fast
  • split operations should also feel quick (no bucket reloading should be done)

TODOs:

  • Missing:

    • [ ] The frontend mapping functionality (formerly used for JSON mappings only) doesn't support uint64, but 64-bit IDs are needed for larger datasets.
      • [X] implement cuckoo hashing for that
      • [ ] properly test with a 64 bit dataset
        • [X] rendering and merger mode
        • [ ] proofreading with a 64 bit agglomerate mapping
          • [X] fix bigint to proto serialization packing
          • [ ] ...
    • [x] handleProofreadCutNeighbors and handleSkeletonProofreadingAction were not yet adapted to work with frontend mappings. Look at handleProofreadMergeOrMinCut for implementation guidance.
      • [X] handleProofreadCutNeighbors
      • [x] handleSkeletonProofreadingAction
    • [X] The wkstore_adapter currently hardcodes shouldUseDataStore to true, because the backend was not yet adapted to return unmapped segment IDs from the tracing store if an editable mapping is active. If that is changed, this part of the code can be reset and brushed volume data is loaded from the tracing store correctly again. /cc @fm3
      • this works now by always requesting from the datastore when the hdf5 mapping should be applied locally. if the segmentation data was changed (thus, the tracing store would need to be requested), the mapping will be applied by the back-end (proofreading is disabled in that case)
    • [x] There are quite a few typescript errors due to the existance of number and bigint mappings. At some points I had success using generics (T extends Map<number,number> | Map<bigint,bigint>) to let typescript understand that the two won't be mixed, but I'm not sure whether it's possible to get it to understand that for all of the code :thinking:
    • [x] skeleton-based proofreading
    • [x] reload segmentation layer feature does not work correctly
    • [ ] the semantics of segment.id has to be fixed (mapped vs unmapped); see discussion
  • Bugs:

    • [x] Reloading after save does not properly initialize the mapping
    • [x] sometimes, hovering super-voxels in the data viewports doesn't work as expected
    • [x] sometimes, the segment id is not an integer apparently and then jsConvertCellIdToRGBA fails
    • [X] ~Mesh loading after proofreading actions currently triggers errors, because the getDataValue method used to determine the agglomerate IDs after the action uses the mapping from before.~ should be fixed thanks to #7742
    • [x] sometimes, when splitting, the new supervoxels are shown in the data viewports and then the mapping is reset so that the old supervoxels are shown
    • [x] sometimes, splitting [and/or?] merging produces corrupt mappings
    • ~[ ] double-check that mapping texture is correctly attached/detached when switching between modes~ doesn't seem important to me (the opposite: unloading and re-uploading the texture when switching back and forth seems wasteful)
    • [X] cuckoo table's capacity cannot be used up to 90%
      • [X] fix suboptimal hash function for cuckootables with a single numeric key
      • [X] fix iteration threshold constant
      • [X] fix recursive rehashing
    • [x] toggling segmentation layer with "3" won't get you back to proofreading mode (annoying when no proofreading action has been done yet)
    • [x] merging/splitting produces corrupt annotations? the tracing store route needed sorted ids (for the datastore route it is irrelevant)
  • Performance: ~I did not rigorously evaluate the performance yet, but browsing through the data feels laggier than before.~ See https://github.com/scalableminds/webknossos/pull/7654#issuecomment-2182560524.

    • ~A quick profiling showed that 10% of the time is spent in the bucket's getValueSet methods with invocations of getValueSets taking up to 250 ms depending on the number of cache misses (note, union'ing the sets afterwards is quick in comparison, usually 5 ms)~ see comments below
      • ~Writing the mapping texture is also quick in comparison ~5-10 ms~ (with cuckoo, updates are below 1ms typically)
    • [X] The mapping lookup in the shader uses a binary search approach. For mapping sizes of ~50-100k this results in up to 18 texture lookups to locate the key and another lookup for the value. I certainly hope using cuckoo hashing would increase performance.
    • [x] avoid roundtrip for new agglomerate ids before calling refreshAffectedMeshes
    • [x] the mapping could be incrementally written thanks to the new cuckoo table which should help with performance
    • [X] offload value set computation ~to web worker~ (didn't help much) and do it as soon as buckets arrive to distribute the workload (this is only done now when the value sets are really needed)
    • ~[ ] maintain cached value union in data cube~ didn't help (tested in c7bb508be7)
    • [x] improve diffing of mappings
      • [X] avoid redundant work with caching
      • [x] doublecheck that diffing in saga doesn't do redundant work

Issues:

  • fixes #7021

(Please delete unneeded items, merge only when none are left open)

  • [ ] Updated changelog
  • [ ] Updated migration guide if applicable
  • [ ] Updated documentation if applicable
  • [ ] Adapted wk-libs python client if relevant API parts change
  • [ ] Removed dev-only changes like prints and application.conf edits
  • [ ] Considered common edge cases
  • [ ] Needs datastore update after deployment

fm3 avatar Feb 27 '24 09:02 fm3

  • Current state:
    • Mappings are applied in the frontend. The mapping information is bulk-requested from the server once a mapping is enabled. If buckets change, the mapping information for the new segment IDs is requested from the server while old segment IDs are evicted from the maintained frontend mapping (currently throttled to 1s). The frontend mapping is updated when performing segment-based proofreading actions, making a server-round-trip obsolete in the merge case. In the split case the server is still needed to perform the min cut and to assign segments to one of the two resulting agglomerates.

moved todo-items to PR description

daniel-wer avatar Apr 08 '24 16:04 daniel-wer

the backend was not yet adapted to return unmapped segment IDs from the tracing store if an editable mapping is active

I’m not entirely sure which route this concerns. Is it about segmentation data loading? If so, I’d suggest to use the datastore indeed, as this will save a hop, compared to loading the same unmapped data through the tracingstore. Brushing is not allowed for editable mappings anyway, right?

In fact, if the tracingstore does not have to support editable-mapping-applied data loading anymore, we may be able to throw out quite a bit of backend code.

fm3 avatar Apr 09 '24 08:04 fm3

I’m not entirely sure which route this concerns. Is it about segmentation data loading? If so, I’d suggest to use the datastore indeed, as this will save a hop, compared to loading the same unmapped data through the tracingstore. Brushing is not allowed for editable mappings anyway, right?

In fact, if the tracingstore does not have to support editable-mapping-applied data loading anymore, we may be able to throw out quite a bit of backend code.

Yes, this is about segmentation loading. You are correct that brushing is not allowed for editable mappings, so in that case we can simply use the datastore. However, the user is allowed to brush in a mapping when no proofreading was done yet (this locks the mapping and proofreading cannot be done afterwards). Since the frontend always applies the mapping locally with this PR, it will need the unmapped data from the tracing store. Alternatively, we could make the frontend support both locally and remotely applied mappings. It's probably best to talk about this in person. The scenario is quite complex..

philippotto avatar May 02 '24 15:05 philippotto

@MichaelBuessemeyer As discussed, feel free to do a rough review of this rough PR. I managed to finish a first version of cuckoo hashing for applying the mapping on the gpu. however, I didn't test uint64 bit support yet (I need organize a test dataset for that). also, performance and general clean up are open todos.

philippotto avatar May 30 '24 15:05 philippotto

A note about performance: Before I introduced the cuckoo hashing for mapping the data, the FPS value was at ~22 when flying through a certain DS. With the cuckoo hashing and the newest optimizations, the value is now at ~49 fps. The same benchmark within the Move tool (so, no local mapping is happening) shows ~ 51 fps. Therefore, I think, performance is okay now. Profiling also doesn't show any obvious culprits.

philippotto avatar Jun 21 '24 11:06 philippotto

@daniel-wer I just finished testing and fixing the 53 bit support. I would love to also get some PR feedback from you :pray:

@MichaelBuessemeyer I incorporated most of your feedback, but deferred some suggestions, as some more pressing issues came up. Let's wait for Daniel's feedback and then re-evaluate the next steps.

philippotto avatar Jul 15 '24 13:07 philippotto

My notes from testing:

  • [x] Activating a moderately large JSON mapping (astrocyte in ROI2017_wkw), the "Activating Mapping" notification at the top of the page hides very early, although the mapping takes quite a bit longer to be actually active. The console is also still logging things like sizeOfOnlyB 1532353 which is why I think it should be possible to hide the notification at a more accurate point in time.
    • [x] Now, it doesn't disappear.
  • [x] Activating the astrocyte mapping and then switching to the small JSON mapping mini-mapping-with-colors in ROI2017_wkw makes the page freeze for half a minute for me which seems fishy.
  • ~[ ] Disabling mini-mapping-with-colors again does not show the original colors~ (bug already present on master; very niche-y because it only happens for json mappings with custom colors)
  • [x] As already suspected, the selective segment visibility (opacity adjustment) does not work as expected if users configured another segmentation opacity value than 0.2. I suspect the absolute segmentation opacity needs to be used during the adjustment calculation.
  • [x] If an annotation has an editable mapping, the supervoxels within the active mesh are always highlighted when hovering in the 3d view, even if the proofreading tool is not active. (I thought you fixed that, but I'm still able to reproduce)
  • [X] The "Hide unmapped segments" switch appears when an HDF5 mapping is active and the proofreading tool is activated. This used to be shown only for JSON mappings (and I'd suggest to keep it that way, because it doesn't make much sense the way our HDF5 mapping format works) so I think the check in the mapping_settings_view needs to be adapted slightly.
    • [x] Maybe while you are at it you can also fix that mapping is used to decide whether a mapping or no mapping was locked to the annotation, leading to a wrong hint on the disabled mapping switch if an HDF5 mapping was locked to the annotation (see screenshot). image
  • [x] Using the "Split from all neighbors" functionality in the l4dense_motta_et_al_dev dataset for the node at 2873, 4539, 1777 a crash was triggered. Seems to happen for other locations, too. image
  • [x] When switching between the proofreading tool and other tools the mapping location is switched from frontend to backend and vice versa. This works well, but takes a while on my system (backend -> frontend). I see segmentation data during that time, but the view is essentially frozen for 5-10s. Maybe a notification similar to the one when activating a mapping could be shown during the transition period? Or is this instant on your system?

Everything else proofreading and mapping related I tested worked brilliantly :100:

daniel-wer avatar Jul 17 '24 14:07 daniel-wer

Thank you for the review and the testing report :pray:

  • [x] If an annotation has an editable mapping, the supervoxels within the active mesh are always highlighted when hovering in the 3d view, even if the proofreading tool is not active. (I thought you fixed that, but I'm still able to reproduce)

This is the only thing I still have to look into. As this shouldn't be a big deal (impact-wise), it would be cool if you could already give the branch another test?

  • [x] Using the "Split from all neighbors" functionality in the l4dense_motta_et_al_dev dataset for the node at 2873, 4539, 1777 a crash was triggered. Seems to happen for other locations, too.

For the record: I changed it so that these errors become soft errors (the user can retry then). there are two causes for this issue: 1) a race condition where the mapped id is not available yet (should happen very rarely; let's re-evaluate in production). 2) the dev dataset you used is cropped and the agglomerate file isn't. for that reason, the back-end serves neighbors that the front-end cannot look up. this should only be a dev-specific and therefore not critical. ideally, we would crop the agglomerate file, too.

  • [x] When switching between the proofreading tool and other tools the mapping location is switched from frontend to backend and vice versa. This works well, but takes a while on my system (backend -> frontend). I see segmentation data during that time, but the view is essentially frozen for 5-10s. Maybe a notification similar to the one when activating a mapping could be shown during the transition period? Or is this instant on your system?

I added a simple "Reloading segmentation layer..." message that only stays for one second (everytime the layer is reloaded). Could you check again whether this feels okay now? For me, the issue with the tool switching is now more pleasant, but it's also faster on my system. Hiding the message when the (entire? or most of the?) mapping has been loaded to the gpu is probably a bit more tricky (especially, because it's hard for me to reproduce).

philippotto avatar Jul 23 '24 16:07 philippotto

If an annotation has an editable mapping, the supervoxels within the active mesh are always highlighted when hovering in the 3d view, even if the proofreading tool is not active. (I thought you fixed that, but I'm still able to reproduce)

Luckily it was easy to fix :tada:

philippotto avatar Jul 24 '24 13:07 philippotto

Awesome, code LGTM and no issues at all during retesting, except for one possible performance issue.

I tested how fast one could go merging segments and the performance was a bit underwhelming, although merging happens entirely in the frontend. See this gif:

merging_is_slow

with this console output from two different merge actions that were comparably slow:

image

Can you reproduce this and if so do you think that could be sped up?

daniel-wer avatar Jul 30 '24 12:07 daniel-wer