Mike Boss

Results 19 comments of Mike Boss

This may be a solution that only changes `cast_storage` of `Image`. However, I'm not totally sure that the assumptions hold that are made about the `ListArray`. ```python elif pa.types.is_list(storage.type): from...

This actually applies to all arrays (numpy or tensors like in torch), not only from external files. ```python import numpy as np import datasets ds = datasets.Dataset.from_dict( {"image": [np.random.randint(0, 255,...

I think this is due to the speed of reading a `png` image using pillow compared to a `jpg` image. Notably the same is true with `tiff`, it is even...

This is because of the formatter (`torch` in this case). It defaults to `float32`. You can load it in `float16` using `dataset.set_format("torch", dtype=torch.float16)`.

I just ran into this issue. Setting a larger buffer_size fixes it for me atm. Is there a better way to solve this currently?

Hey! I didn't have time to look into this but I just stumbled upon another problem. While my fix kind of made it usable I now pre-embedded the images and...

Thanks, I have looked into this and have a working solution at least for my specific case. But I had quite a few issues along the way that are not...

I have run into some issues, notably I don't think `FixedShapeTensorArray` is completely supported by `pandas `and `polars`. Well it seems to work for `pandas `but one loses the actual...

so I guess a workaround to this is to simply remove all columns except the ones to cache and then add them back with `concatenate_datasets(..., axis=1)`.