KrzysztofDoboszInpost comments

Results 4 comments of


                                            KrzysztofDoboszInpost

Pandas DataFrame index not preserved with pandas.DeltaTableDataset

Actually, there might be more than one index level and at some point `deltalake` might start handling `pandas_metadata`, so: ``` index_cols = delta_table.columns[delta_table.columns.str.match(r"__index_level_+\d__")].values.tolist() if index_cols: delta_table = delta_table.set_index(index_cols) ```

Pandas DataFrame index not preserved with pandas.DeltaTableDataset

I changed my mind. I needed a dataset that I'd use as a local counterpart of databricks.ManagedTableDataset (or actually something derived from this one), and that dataset ignores pandas index...

Support for remote spark sessions and databricks-connect

How about trying to import `DatabricksSession` and falling back to `SparkSession` if the import fails? `DatabricksSession` is available only in `databricks-connect-v2`. Anyway, currently kedro fixes dependency on spark to `pyspark>=2.2,

New dataset databricks.ExternalTableDataset

Sure, as soon as I'll be able to :) In the meantime: would you rather create a separate ExternalTableDataset, with a lot of common code with ManagedTableDataset (possibly inherited?), or...