webknossos
webknossos copied to clipboard
Use WKW header as groundtruth for dtype of layers
Not sure whether we can get rid of the dtype in the datasource-properties.json altogether for now? Might be necessary for knossos datasets? Either way, the wkw header should have precedence in my opinion. Alternatively, at least show a warning if the two dtypes are different.
Knossos support has been disabled in the meantime, so this is not an issue.
However, several of the fields of datasource-properties.json are originally designed as a cache for information that would otherwise be available via a file walk / header.wkw reads.
These are layer name, resolutions, cubeLength, elementClass, mappings, dataFormat. Even the largestSegmentId could be computed by reading all files.
Strictly speaking, only the bounding box per layer and the scale are user-supplied information.
It would be very slow to infer all the other properties from the file system every time, though.
The wK import view is designed to infer what can be inferred from the files, and suggest updating the datasource-properties.json
I don’t really have a better idea of how to handle this. Do you have thoughts?
Maybe we could cache the metadata somewhere else so that it isn't user-accessible? We would need to find a good cache invalidation strategy, though.
It would be very slow to infer all the other properties from the file system every time, though.
Do you think there is a big difference between reading the dtype from the datasource-properties.json and reading it from header.wkw (per layer)? If so, I'd still advocate for a separate cache which is hidden from the user.
It would be very slow to infer all the other properties from the file system every time, though.
I think, we can/should focus on the properties which are included in header.wkw (i.e., dtype and cubeLength). In my opinion, they are especially "dangerous" since they are one of the harder concepts to grasp as a user. If a layer is missing, this is at least somewhat easy to spot from the datasource-properties.json.
Regarding cache invalidation: A very simplistic approach would be to just wait for the user to hit "Reload" in the dashboard for a dataset. If the user changed the data on disk, that step is way more intuitive (at least imo) than clicking "Edit" and then confirming the new inferred data which is merged into the user-supplied metadata.
I believe with the introduction of Zarr/N5 and native neuroglancer precomputed file format support that WKW headers might not be the right solution for saving a layer's datatype. (Unless I misunderstand the issue to begin with)
The “explore” route inspects wkw headers (or zarr/n5/precomptued metadata files) and sets the value for the datasource-proprties.json. This only happens once during explore, but we might want to do this every time the dataset settings page is opened (similar to the current suggestDatasourceJson
codepath that only works for wkw). That is tracked in #7474
However, this issue is about doing this directly when opening/viewing the dataset. And I think that’s something that we do not currently want to do (perf impact + hard to propagate the error to the user). So I’m closing this issue. Feel free to reopen if you disagree