jodie icon indicating copy to clipboard operation
jodie copied to clipboard

Delta lake and filesystem helper methods

Results 14 jodie issues
Sort by recently updated
recently updated
newest added

We faced a production issue with some of our pipelines while using Delta 1.0.1 and as a follow-up, we raised an [issue](https://github.com/delta-io/delta/issues/2455) on Delta core. While it seems that this...

See here for the API: https://github.com/MrPowers/mack/#append-data-with-constraints

good first issue

As mentioned in this [post](https://lakefs.io/blog/how-to-implement-write-audit-publish/#delta-lake) OSS delta does not support the WAP(write audit publish) pattern I think this is something we can implement here in Jodie. If you don't know...

I think it will be interesting to add optionnal parameter ["PathRejects"], to write deduplicated rows, if we need to do some analyse of DataQuality when we have DuplicatedRow from source....

good first issue

Need to modify Remove Duplicates function to remove duplicates from delta table/parquet file and keep latest record (sort by timestamp column)

Add function delete from Deltabale where exist in dataframe + update Readme

Hello, There is an interesting function, is to delete rows from Table when value of some columns exist in dataframe, i searched a function like that i haven't found it...

``` val duplicates = df .select() .withColumn("__file_path", col("_metadata.file_path")) .withColumn("__row_index", col("_metadata.row_index")) .withColumn( "rank", row_number().over( Window() .partitionBy() .orderBy())) .filter("rank > 1") .drop("rank") ``` And then: ``` df.alias("old") .merge( duplicates.alias("new"), "old. = new....

Add all jodie-related blogs to the project README.

We want to let developers know about the public interface of this project on social media: * `Type2Scd.upsert` * `DeltaHelpers.removeDuplicateRecords` (remove all occurrences) * ~~`DeltaHelpers.removeDuplicateRecords` (leave one occurrence)~~ * `DeltaHelpers.latestVersion`...