data-prep-kit icon indicating copy to clipboard operation
data-prep-kit copied to clipboard

Refactoring runtime to eliminate most of the code duplication between different runtimes.

Open blublinsky opened this issue 8 months ago • 4 comments

Search before asking

  • [x] I searched the issues and found no similar issues.

Component

Ray Runtime, Spark Runtime, Python Runtime

Feature

In the current DPK implementation, there is a lot of code duplication between multiple runtimes, which makes code harder to maintain and makes it very hard to customize the execution. Bringing common code to a shared Python class with a well defined overwrite methods will improve overall maintenance and makes it by far easier to customize execution.

An example of such implementation can be found here https://github.com/The-AI-Alliance/dpk/commit/a29e802513ffb59936b9fbc5e3519eeb8a8e8ea2

Are you willing to submit a PR?

  • [x] Yes I am willing to submit a PR!

blublinsky avatar Mar 21 '25 15:03 blublinsky

This addresses a serious concern I have, that creating new transforms, whether intended for the core repo or independent use, is way to complicated. It's a serious blocker for third-party adoption of DPK and a maintenance burden, if adopted.

deanwampler avatar Mar 21 '25 15:03 deanwampler

@deanwampler This has been a common observation (and one of the reason that I agreed to join the project honestly). So this topic is very important to me but it is also important that you take this a step forward so I can better understand if your vision is aligned with ours. Our first approach was to hide the complexity from the "users" of the transform and restructure the code so it is easy to package and deploy. (i.e. we went for having 40+ wheels to 2 wheels, we also simplified the developer's burden by aligning the code with python best practices and eliminated a number of steps). In the next iteration, we will reduce the developer's burden by eliminating steps 4 and 5 in the link you provided and simplifying step 10. Which of the remaining 8 steps you feel can be removed or can be simplified?

touma-I avatar Mar 24 '25 14:03 touma-I

cc: @shahrokhDaijavad

shahrokhDaijavad avatar Mar 24 '25 23:03 shahrokhDaijavad

I am sorry, but this issue has nothing to do with individual transforms. It's purely about refactoring runtime to remove a lot of code duplication and allow for a simple runtime configurability

blublinsky avatar Mar 25 '25 10:03 blublinsky