cloud-volume icon indicating copy to clipboard operation
cloud-volume copied to clipboard

mesh.get: return vertex->L2 ID map

Open schlegelp opened this issue 3 years ago • 15 comments

Hi. This is a question that might lead into a feature/pull request.

For various operations it would be useful to get a mesh vertex to L2 chunk ID mapping. That way we could use e.g. the L2 graph to heal fragmented mesh or derived skeletons. Naively, I assume that the filenames in fragments (line) correspond to the L2 IDs?

If that's the case, would you be open to a pull request that implements optionally returning a such mapping? At a first glance this looks fairly straight-forward as long as we don't need to worry about deduplicating vertices (which is not implemented for FlyWire meshes anyway). @ceesem might be interested in this too.

schlegelp avatar Apr 20 '21 14:04 schlegelp

Hi Philipp,

This diagram might be helpful in understanding the relationship between filename and ID: https://github.com/seung-lab/cloud-volume/wiki/Graphene#directory-structure

The level two meshes are only fetched if level=2 is set, otherwise arbitrary meshes may be used. Since the filenames can come from the sharded initial meshes OR the per-label dynamic meshes, the get_fragment_labels part of the call tree is much more reliable. It should be possible to calculate the filename per a label to get the mapping. Going backwards is (much) harder for the initial meshes.

The level of the segid is encoded in the segid and can be retrieved via: cv.meta.decode_layer_id(SEGID)

The idea of this mapping seems pretty useful! Can you show me the an example pseudo-code usage of the function? What data structure are you thinking of?

Will

william-silversmith avatar Apr 20 '21 22:04 william-silversmith

just to add to what will said, in order to speed mesh downloading, higher level fragments are stitched, and so when downloading meshes level2 mesh chunks are typically not used (it would take far longer than downloading higher level meshes). I think the feature that you would want, which is what lvl2chunk did a mesh vertex come from, while still reducing the number of mesh fragments being downloaded is a tough one.

One flow you could go down , is for each mesh vertex, determine what chunk it came from, and do the same for lvl2 ids you are interested in. For the subset of chunks for which there is a 1-1 mapping there, you can label those mesh vertices. For a subset, there will be an ambiguity, and you will have to download the level2 fragments from that chunk to find out which vertices belong to which, and for another unfortunate subset, the level2 fragment will be missing (or doesn't exist because its too small to mesh). There you will have to fallback to downloading the segmentation in that chunk (and perhaps discovering that no voxels exist at the resolution you downloaded).

fcollman avatar Apr 20 '21 23:04 fcollman

Oh I see - nothing is ever easy, is it? Thanks both! Hmmm, I was hoping that there was a closer correspondence.

Looking at this example neuron (720575940619059120), I can see that it has 252 fragments but 1264 L2 IDs - i.e. each fragment consists on average of ~5 L2 chunks.

Screenshot 2021-04-21 at 22 05 37

Do you reckon there might be a way to get a "best guess" using some heuristic? For example the second screenshot shows the chunk positions of the L2 IDs for this neuron. If I knew which L2 chunks were combined into a given fragment, I could use e.g. some k-means clustering to assign vertices to an L2 ID.

Screenshot 2021-04-21 at 22 13 56

schlegelp avatar Apr 21 '21 21:04 schlegelp

You can use get_leaves to find out the level2ids underneath each of the fragments.

fcollman avatar Apr 21 '21 22:04 fcollman

OK cool! I think I need one more nudge in the right direction:

For this neuron, for example, the first two fragment filenames are:

['180222076190198150:0:32768-33024_18176-18432_3072-3584',
 '469500261153458003:0:32768-36864_16384-20480_0-8192']

I am assuming that 180222076190198150 and 469500261153458003 are the IDs of those fragments?

So turns out that the first, 180222076190198150, already is a L2 ID but 469500261153458003 is not. vol.get_leaves(469500261153458003, vol.meta.bounds(0), 0) then gives me only the supervoxel IDs, right? So then I have to use vol.get_roots(svoxel_id, stop_layer=2) to go back up?

In this particular case, 469500261153458003 maps to 747 supervoxels which then in turn map to 13 L2 IDs. That get_roots call is actually fairly expensive (~.5s in here) and it seems that about half the fragments (134) are not L2 chunks themselves which makes the whole operation rather expensive. Is there a better way than going down to supervoxels and then going back up to L2 IDs?

schlegelp avatar Apr 22 '21 08:04 schlegelp

For this particular case, those are dynamic meshes, so the segment ID is in the name of the file like you indicated. You can extract the level of the mesh from the segment ID and avoid the get_leaves call in these redundant cases.

william-silversmith avatar Apr 22 '21 15:04 william-silversmith

But for non-L2 dynamic meshes I can't avoid get_leaves() into get_roots(.., stop_layer=2)?

schlegelp avatar Apr 22 '21 20:04 schlegelp

I suspect it would be possible to implement a more efficient call on the server side, but I think that's probably your best bet at the moment. Generally speaking, these objects are extremely large so the strategy of the PyChunkGraph has been to do as much work at a high level as possible. Requiring L2 labels across the entire object runs counter to that strategy unfortunately. Low level operations are very useful, but we're still figuring out how to do it.

william-silversmith avatar Apr 22 '21 21:04 william-silversmith

we need to push the stop_layer functionality to master and deploy it... i'll ask @sdorkenw about it.

fcollman avatar Apr 22 '21 21:04 fcollman

i mean the stop_layer on get_leaves

fcollman avatar Apr 22 '21 21:04 fcollman

@fcollman We'll need to coordinate that with cloud-volume. Is the interface: ?stop_layer=2? Will it be enabled on v1 and 1.0?

william-silversmith avatar Apr 22 '21 22:04 william-silversmith

it will be on the v1, it's already there on the microns version of the chunkedgraph so you can test it there.

fcollman avatar Apr 23 '21 02:04 fcollman

https://github.com/seung-lab/PyChunkedGraph/pull/299

fcollman avatar Apr 23 '21 02:04 fcollman

okay.. this is now fixed and deployed. For id=720575940619059120 i see 1264 lvl2IDs. One other thing to note, you can download the level2 fragments, and not the largest fragments available.

https://github.com/seung-lab/cloud-volume/blob/master/cloudvolume/datasource/graphene/mesh/unsharded.py#L114

You see here the 'level' parameter ends up being passed to the server to tell it to pull manifests that don't use fragments that go above this level. Cloudvolume hasn't exposed this functionality right now, but you can follow the pattern in the above function but pass level=2 instead and you'll get lvl2 mesh fragments instead.

fcollman avatar Apr 24 '21 15:04 fcollman

Thanks, that works! I was thinking of keeping this issue for the time being, so I can report back if having this mapping proves useful for working with meshes?

schlegelp avatar Apr 26 '21 14:04 schlegelp