cloud-volume icon indicating copy to clipboard operation
cloud-volume copied to clipboard

Error fetching sharded data

Open schlegelp opened this issue 4 years ago • 4 comments

Hi,

I'm trying to load some of the Janelia hemibrain EM image data but run into this issue:

>>> vol = CloudVolume('precomputed://gs://neuroglancer-janelia-flyem-hemibrain/emdata/clahe_yz/jpeg', fill_missing=True)
>>> vol[6300:6500, 20400:20600:, 14000]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-59-f9e19f62db12> in <module>
----> 1 vol[6300:6500, 20400:20600:, 14000]

~/.local/lib/python3.7/site-packages/cloudvolume/frontends/precomputed.py in __getitem__(self, slices)

~/.local/lib/python3.7/site-packages/cloudvolume/frontends/precomputed.py in download(self, bbox, mip, parallel, segids, preserve_zeros, agglomerate, timestamp, stop_layer)

~/.local/lib/python3.7/site-packages/cloudvolume/datasource/precomputed/image/__init__.py in download(self, bbox, mip, parallel, location, retain, use_shared_memory, use_file, order)

~/.local/lib/python3.7/site-packages/cloudvolume/datasource/precomputed/sharding.py in from_dict(cls, vals)

TypeError: __init__() missing 1 required positional argument: 'data_encoding' 

Any idea what I'm doing wrong here?

Thanks! :)

schlegelp avatar Jun 23 '20 14:06 schlegelp

Forgot to add: this is with cloudvolume 1.18.1

schlegelp avatar Jun 23 '20 14:06 schlegelp

With more traceback:

<ipython-input-2-dbdeda5c0ce0> in <module>
----> 1 vol[6300:6500, 20400:20600:, 14000]

~/.pyenv/versions/3.7.5/lib/python3.7/site-packages/cloudvolume/frontends/precomputed.py in __getitem__(self, slices)
    526     requested_bbox = Bbox.from_slices(slices)
    527 
--> 528     img = self.download(requested_bbox, self.mip)
    529     return img[::steps.x, ::steps.y, ::steps.z, channel_slice]
    530 

~/.pyenv/versions/3.7.5/lib/python3.7/site-packages/cloudvolume/frontends/precomputed.py in download(self, bbox, mip, parallel, segids, preserve_zeros, agglomerate, timestamp, stop_layer)
    568       parallel = self.parallel
    569 
--> 570     img = self.image.download(bbox, mip, parallel=parallel)
    571 
    572     if segids is None:

~/.pyenv/versions/3.7.5/lib/python3.7/site-packages/cloudvolume/datasource/precomputed/image/__init__.py in download(self, bbox, mip, parallel, location, retain, use_shared_memory, use_file, order)
    113     scale = self.meta.scale(mip)
    114     if 'sharding' in scale:
--> 115       spec = sharding.ShardingSpecification.from_dict(scale['sharding'])
    116       return rx.download_sharded(
    117         bbox, mip,

~/.pyenv/versions/3.7.5/lib/python3.7/site-packages/cloudvolume/datasource/precomputed/sharding.py in from_dict(cls, vals)
    123     vals['type'] = vals['@type']
    124     del vals['@type']
--> 125     return cls(**vals)
    126 
    127   def to_dict(self):

TypeError: __init__() missing 1 required positional argument: 'data_encoding'

schlegelp avatar Jun 23 '20 14:06 schlegelp

I got it to work by manually setting the data_encoding:

vol.scale['sharding']['data_encoding'] = 'raw' 
m = vol[6300:6500, 20400:20600:, 14000]

It is rather slow for a 200x200 cutout though:

%timeit m = vol[6300:6500, 20400:20600:, 14000]
12.8 s ± 5.04 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

Is there a way I can speed it up?

schlegelp avatar Jun 23 '20 14:06 schlegelp

It looks like Neuroglancer defaults to using RAW for data encoding, so whom am I to judge?

https://github.com/google/neuroglancer/blob/1768c2271a3623264063173f4ba96b2013f8129d/src/neuroglancer/datasource/precomputed/frontend.ts#L344-L347

I'm fixing this in PR #356.

Is it still slow for you? The same code is taking 2.7 sec for me.

william-silversmith avatar Jul 09 '20 04:07 william-silversmith