webknossos icon indicating copy to clipboard operation
webknossos copied to clipboard

Use WKW header as groundtruth for dtype of layers

Open philippotto opened this issue 4 years ago • 3 comments

Link to Discuss

Not sure whether we can get rid of the dtype in the datasource-properties.json altogether for now? Might be necessary for knossos datasets? Either way, the wkw header should have precedence in my opinion. Alternatively, at least show a warning if the two dtypes are different.

philippotto avatar Feb 27 '20 12:02 philippotto

Knossos support has been disabled in the meantime, so this is not an issue.

However, several of the fields of datasource-properties.json are originally designed as a cache for information that would otherwise be available via a file walk / header.wkw reads.

These are layer name, resolutions, cubeLength, elementClass, mappings, dataFormat. Even the largestSegmentId could be computed by reading all files.

Strictly speaking, only the bounding box per layer and the scale are user-supplied information.

It would be very slow to infer all the other properties from the file system every time, though.

The wK import view is designed to infer what can be inferred from the files, and suggest updating the datasource-properties.json

I don’t really have a better idea of how to handle this. Do you have thoughts?

fm3 avatar Mar 15 '21 10:03 fm3

Maybe we could cache the metadata somewhere else so that it isn't user-accessible? We would need to find a good cache invalidation strategy, though.

It would be very slow to infer all the other properties from the file system every time, though.

Do you think there is a big difference between reading the dtype from the datasource-properties.json and reading it from header.wkw (per layer)? If so, I'd still advocate for a separate cache which is hidden from the user.

It would be very slow to infer all the other properties from the file system every time, though.

I think, we can/should focus on the properties which are included in header.wkw (i.e., dtype and cubeLength). In my opinion, they are especially "dangerous" since they are one of the harder concepts to grasp as a user. If a layer is missing, this is at least somewhat easy to spot from the datasource-properties.json.

Regarding cache invalidation: A very simplistic approach would be to just wait for the user to hit "Reload" in the dashboard for a dataset. If the user changed the data on disk, that step is way more intuitive (at least imo) than clicking "Edit" and then confirming the new inferred data which is merged into the user-supplied metadata.

philippotto avatar Mar 15 '21 10:03 philippotto

I believe with the introduction of Zarr/N5 and native neuroglancer precomputed file format support that WKW headers might not be the right solution for saving a layer's datatype. (Unless I misunderstand the issue to begin with)

hotzenklotz avatar Oct 21 '22 11:10 hotzenklotz

The “explore” route inspects wkw headers (or zarr/n5/precomptued metadata files) and sets the value for the datasource-proprties.json. This only happens once during explore, but we might want to do this every time the dataset settings page is opened (similar to the current suggestDatasourceJson codepath that only works for wkw). That is tracked in #7474

However, this issue is about doing this directly when opening/viewing the dataset. And I think that’s something that we do not currently want to do (perf impact + hard to propagate the error to the user). So I’m closing this issue. Feel free to reopen if you disagree

fm3 avatar Jan 23 '24 13:01 fm3