data-prep-kit
data-prep-kit copied to clipboard
Improve transforms venv
trafficstars
Search before asking
- [X] I searched the issues and found no similar issues.
Component
Transforms/Other
Feature
At the moment, each transform copies the data-processing-lib libraries and installs them into its transform venv. This step can be time-consuming, particularly in a CI/CD setting. To expedite this process the suggested approach is to avoid copying the libraries altogether and instead only update them if their content has changed. This can be achieved by utilizing a shared environment (venv) for all transformers and copying the data-preparation libraries only when necessary.
Are you willing to submit a PR?
- [ ] Yes I am willing to submit a PR!
It seems that the following approach could help:
- Create a shared venv in the main repo where the data-prep-lab libraries are installed. This shared venv will be re-created only when the sources in this data-prep-lab libraries are changed.
- create venv in each transform directory as is currently done (under noop/ray, noop/python....). The local transform lib will be installed there. Use
PYTHONPATHto point to the shared venv directory.
I tested it locally and it seems to work.
@daw3rd if this approach seems ok with you I can implement that, Thanks