Boris Lublinsky
Boris Lublinsky
## Why are these changes needed? After several conversations with Constantin, it appears useful to have a transform for processing folders rather then file. ## Related issue number (if any)....
## Why are these changes needed? Initial implementation of pipeline transform ## Related issue number (if any).
## Why are these changes needed? This significantly simplify Fuzzy dedup implementation and create Python version of it ## Related issue number (if any).
### Search before asking - [x] I searched the [issues](https://github.com/IBM/data-prep-lab/issues) and found no similar issues. ### Component transdforms/Other ### Feature In the current implementation of DPK, each transform reads all...
### Search before asking - [x] I searched the [issues](https://github.com/IBM/data-prep-lab/issues) and found no similar issues. ### Component Ray Runtime, Spark Runtime, Python Runtime ### Feature In the current DPK implementation,...
### Search before asking - [x] I searched the [issues](https://github.com/IBM/data-prep-lab/issues) and found no similar issues. ### Component Ray Runtime, Python Runtime, Spark Runtime ### Feature In the current implementation of...
## Why are these changes needed? AI Alliance is planning to use Data Prep Kit for validating data stored in HF datasets ## Related issue number (if any). https://github.com/IBM/data-prep-kit/issues/964