fuel icon indicating copy to clipboard operation
fuel copied to clipboard

H5PYDataset returns ndarray instead of list

Open gyglim opened this issue 9 years ago • 2 comments

When using Variable-length data, the datatype returned by get_data is an ndarray (object type) of an ndarray containing the data itself, e.g. for images, it is an ndarray containing the images. However, other methods do not handle this, e.g. RandomFixedSizeCrop requires a list of images or a 4d array:

if isinstance(source, list) and all(isinstance(b, numpy.ndarray) and
                                        b.ndim == 3 for b in source):
....
raise ValueError("uninterpretable batch format; expected a list "
                         "of arrays with ndim = 3, or an array with "
                         "ndim = 4")

The ServerDataStream also gave me problems. I think we should make this consistent, probably by returning lists in H5PYDataset.get_data for this case, instead in of ndarrays.

gyglim avatar Dec 21 '15 17:12 gyglim

@vdumoulin What's your thought on this? I'm running into this as well.

In general, I think transformers should be agnostic as to which one of the two they're getting (like e.g. MinimumImageDimensions), but there's still the question as to whether we should prefer one format over the other. H5PYDataset returns NumPy objects because fancy indexing was used I guess, and that's a big advantage of dealing with NumPy arrays. On the other hand, transformers can have simpler code if they can just use return [f(x) for x in batch], although I guess we could write a helper function map_likewise that applies a function over a list or NumPy array, and returns the same object it got as an input.

bartvm avatar Jan 27 '16 19:01 bartvm

I think having something like map_likewise would be a good idea. Maybe we could even make it a decorator so that it's easy to enable the behaviour by default?

We should also have something like ToNumpy or ToList transformers, should this behaviour become the norm, so that people can explicitly force changing type.

vdumoulin avatar Jan 28 '16 17:01 vdumoulin