training-operator
training-operator copied to clipboard
Not getting Kubeflow Training SDK v1.7 when installing `kubeflow-training`
In a new virtual environment, I'm installing kubeflow-training
only.
This is the freeze I get:
cachetools==5.3.3
certifi==2024.2.2
charset-normalizer==3.3.2
google-auth==2.29.0
idna==3.7
kubeflow-training==1.7.0
kubernetes==29.0.0
oauthlib==3.2.2
pyasn1==0.6.0
pyasn1_modules==0.4.0
python-dateutil==2.9.0.post0
PyYAML==6.0.1
requests==2.31.0
requests-oauthlib==2.0.0
retrying==1.3.4
rsa==4.9
setuptools==69.5.1
six==1.16.0
urllib3==2.2.1
websocket-client==1.8.0
However, when I inspect the code that's been installed at new_venv/lib/python3.12/site-packages/kubeflow/training/api/training_client.py
the code isn't up to date with the 1.7 SDK release that I can see here on GitHub.
Specifically, I see that the function get_job_logs
is different. I need to most updated one.
Thank you for creating this @JamesKunstle.
We publish SDK on each Training Operator release: https://pypi.org/project/kubeflow-training/.
E.g. the latest version is 1.7, so to see the changes for that SDK, you need to check the release-1.7
branch:
https://github.com/kubeflow/training-operator/blob/v1.7-branch/sdk/python/kubeflow/training/api/training_client.py
What would be the supported path to get the most up-to-date SDK code? The main-branch code does what I want, but not the code that gets pulled when I install the kubeflow-training library
@andreyvelich how do you publish release to PyPi? I took a look at the code and I didn't see any actions doing a release automatically. I reached out to @tenzen-y on this as well.
FWIW @andreyvelich for Feast we have the release process fully automated and deployed to PyPi with this action: https://github.com/feast-dev/feast/actions/workflows/release.yml
Happy to help out and replicate the same here if that would be desirable.
Could you try something like this?
pip install git+https://github.com/kubeflow/training-operator.git@master#subdirectory=sdk/python"
I've never installed from a subdirectory before but I think this should work
@JamesKunstle If you want to get the latest changes for SDK, I added the scripts in this PR: https://github.com/kubeflow/website/pull/3719. Similar to @anishasthana's comment, you can do this:
pip install git+https://github.com/kubeflow/training-operator.git@7345e33b333ba5084127efe027774dd7bed8f6e6#subdirectory=sdk/python
@andreyvelich how do you publish release to PyPi? I took a look at the code and I didn't see any actions doing a release automatically. I reached out to @tenzen-y on this as well.
Currently, for Training Operator we don't have script to automate release process. So, @johnugeorge is publishing SDK manually after we cut the release. However, for Katib SDK we have this script that we run to publish Images + SDK after the release: https://github.com/kubeflow/katib/blob/master/scripts/v1beta1/release.sh#L85-L97.
Happy to help out and replicate the same here if that would be desirable.
That would be awesome if you could help us to automate releases for Training Operator/Katib. We have this issue that we created a while ago: https://github.com/kubeflow/katib/issues/2049.
On a similar note: we have a ton of github actions we built to automate releases for codeflare. Some links...
- https://github.com/project-codeflare/codeflare-sdk/blob/main/.github/workflows/release.yaml
- https://github.com/project-codeflare/codeflare-operator/blob/main/.github/workflows/project-codeflare-release.yml
However, for Katib SDK we have this script that we run to publish Images + SDK after the release: https://github.com/kubeflow/katib/blob/master/scripts/v1beta1/release.sh#L85-L97.
So is publishing the image also manual?
However, for Katib SDK we have this script that we run to publish Images + SDK after the release: https://github.com/kubeflow/katib/blob/master/scripts/v1beta1/release.sh#L85-L97.
So is publishing the image also manual?
We usually publish the operator image by https://github.com/kubeflow/training-operator/blob/86e0df17db715543b366e885c9ae659aa1342c8e/.github/workflows/publish-core-images.yaml#L24-L26.
@andreyvelich @anishasthana Okay yeah that works now, I can see the most recent changes. Would really appreciate a more "pypi"-y way of installing the latest release, I think I was getting a fairly old package when I was installing by name from pypi.
@andreyvelich @anishasthana Okay yeah that works now, I can see the most recent changes. Would really appreciate a more "pypi"-y way of installing the latest release, I think I was getting a fairly old package when I was installing by name from pypi.
Basically, we release SDK when we make another release of Training Operator to keep all component versions consistent: Controller + SDK. That helps us to keep versions stable. Any thoughts @JamesKunstle ?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.