Mike Boss
Mike Boss
I think the `intersect` call can be removed from `RandomGeoSampler` (and the others) as the size check can be performed on `self.index` directly as it was initalized in `GeoSampler`. So...
Splitting the dataset directly does not seem feasible at the moment as the `dataset `does not allow for multiple `roi`s (this would also be a good feature, for example using...
Yes it is a grid but the tiles overlap (I am not sure if there is a good reason for that). At least they do when I checked on `https://earthexplorer.usgs.gov/`...
Thanks, that's exactly what I was wondering. So there is a good case for overlap of the images. A possible solution may be to load the `hit` the `query` originates...
Hmm, I see that does make it more difficult in the general case. A good way would maybe be to add something like a `collate_fn` for selecting the `filepaths` (so...
Fixed the conversion from `pyarrow` to `python` `Sequence` features. There is still an issue that if `features` are passed the `Sequence` always forces conversion to `ListArray`. This probably causes issues...
Default `writer_batch_size `is set to 1000 (see [map](https://huggingface.co/docs/datasets/v2.16.1/en/package_reference/main_classes#datasets.Dataset.map)). The "tmp1335llua" is probably the temp file it creates while writing to disk. Maybe try lowering the `writer_batch_size`. For multi-processing you should...
This seems to be hardcoded behavior in table.py `array_cast`. ```python if ( not allow_number_to_str and pa.types.is_string(pa_type) and (pa.types.is_floating(array.type) or pa.types.is_integer(array.type)) ): raise TypeError( f"Couldn't cast array of type {array.type} to...
I'll gladly create a PR but not sure what the behavior should be. Should a value returned from map be cast to the current feature? At the moment this seems...
Would just `allow_primitive_to_str` work? This should include all `numeric`, `boolean `and `temporal`formats. Note that at least in the [ C++ implementation](https://arrow.apache.org/docs/cpp/api/utilities.html#_CPPv410is_numericRK8DataType) `numeric `seems to exclude `boolean`. [](https://arrow.apache.org/docs/cpp/api/utilities.html#_CPPv410is_numericRK8DataType)