pytorch-operator
pytorch-operator copied to clipboard
Is python sdk still being maintained?
I started using the python sdk with the intent of making it into a kubeflow pipelines launcher, but noticed some mismatch between the pytorchjob sdk and kubernetes. Little stuff like:
- pytorchjob's objects are built with swagger whereas kubernetes is now built with openapi, leading to small breaks
- how the
pytorchjob_client.delete()
callsdelete_namespaced_custom_object()
with too many arguments
Am I doing something wrong? I didn't see anyone reporting these issues. And if the issues are real, is the sdk intentionally deprecated or maybe it just hasn't been brought in line with recent k8s changes?
We can use corresponding launchers in pipelines to launch the PyTorchJob.
And, the SDK is built for 1.16, I think.
cc @johnugeorge
Yeah I liked the extra features in this one, it felt more like something I could see using from a notebook. but using the common launch_crd from the TFJob would be nice. Maybe I can refresh the TFJob one with the extra features this had while I’m at it.
On Wed, Feb 3, 2021 at 20:39 Ce Gao [email protected] wrote:
cc @johnugeorge https://github.com/johnugeorge
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kubeflow/pytorch-operator/issues/317#issuecomment-772959107, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALPFPIZN4QYEDGNE5545SMTS5H3FVANCNFSM4XBQ6VSQ .
I started looking at refreshing this, but I can't reproduce the existing API. I'm following the /hack/python-sdk
scripts, but when I Generate the Python SDK (java -jar ${SWAGGER_CODEGEN_JAR} ...
) I get an incomplete API. See below image (left is what repo has, right is what I get when rebuilding):
I've tried doing this both from current master
and 61fefa88f75b126fd7672f44b87351db511299cb
but neither generates the entire pytorchjob
SDK. Anyone have suggestions on what I'm missing?
/cc @jinchihe
ty!
fyi to use the PyTorchJob API (built using swagger) with kubernetes atm (built using the openapi fork or swagger), I've been subclassing like:
# Patch PyTorchJob APIs to align with k8s usage
class V1PyTorchJob(V1PyTorchJob_original):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.openapi_types = self.swagger_types
And for a pytorch_launcher where I need a dict of the PyTorchJob to pass to k8s API, .to_dict()
generated by swagger has a bug where the attribute map (which remaps python-names to k8s-names) wasn't used. I've been getting around this by serializing using k8s_client.ApiClient().sanitize_for_serialization(job)
rather than job.to_dict()
, but maybe the new openapi tooling fixes this. Worst case, the to_dict()
is easy to patch and I can provide code.
Also happy to help update these if there's anything I can take off your plate!
Hi @jinchihe Is there any progress on this?