Chitral Verma
Chitral Verma
@dominikpeter this issue will also happen for `read_delta`
Let me check it out today
> Hmm.. I see that `deltalake.DeltaTable` can convert to `pyarrow dataset.` Maybe we should instantiate via those? That's the `scan_delta` functionality. works lazily over PA dataset. > Before this new...
I have also opened, an [issue](https://github.com/delta-io/delta-rs/issues/1015) on delta side for this, if they fix it then it should be quite straightforward.
> @chitralverma maybe we can pickle a function that imports the `DeltaFileSystemHandler`? > > If that does not work, we can pickle a string that we can run with `eval`...
> Does it cloudpickle? We could also add a cloudpickle version of `_deser_and_exec` and `_scan_ds_impl`. Nope ``` >>> cloudpickle.dumps(dsh) Traceback (most recent call last): File "", line 1, in File...
So, got some updates from the delta team regarding this [here](https://github.com/delta-io/delta-rs/issues/1015) Apparently, this is not natively supported by pyo3 modules. https://github.com/PyO3/pyo3/issues/100 I've already fixed things for `read_delta`, will look into...
@ritchie46 it should be closed as I have tested for this, but let's keep this open for a while till someone else confirms?
@winding-lines AFAIK, `scan_parquet` and `read_parquet` can rely on `fsspec` filesystems to read data directly from S3, GCS and many more. so do we really need a separate `read_parquet_cloud` for this?...
> > @winding-lines AFAIK, `scan_parquet` and `read_parquet` can rely on `fsspec` filesystems to read data directly from S3, GCS and many more. so do we really need a separate `read_parquet_cloud`...