pytorch-operator icon indicating copy to clipboard operation
pytorch-operator copied to clipboard

Is python sdk still being maintained?

Open ca-scribner opened this issue 4 years ago • 7 comments

I started using the python sdk with the intent of making it into a kubeflow pipelines launcher, but noticed some mismatch between the pytorchjob sdk and kubernetes. Little stuff like:

  • pytorchjob's objects are built with swagger whereas kubernetes is now built with openapi, leading to small breaks
  • how the pytorchjob_client.delete() calls delete_namespaced_custom_object() with too many arguments

Am I doing something wrong? I didn't see anyone reporting these issues. And if the issues are real, is the sdk intentionally deprecated or maybe it just hasn't been brought in line with recent k8s changes?

ca-scribner avatar Feb 03 '21 20:02 ca-scribner

We can use corresponding launchers in pipelines to launch the PyTorchJob.

And, the SDK is built for 1.16, I think.

gaocegege avatar Feb 04 '21 01:02 gaocegege

cc @johnugeorge

gaocegege avatar Feb 04 '21 01:02 gaocegege

Yeah I liked the extra features in this one, it felt more like something I could see using from a notebook. but using the common launch_crd from the TFJob would be nice. Maybe I can refresh the TFJob one with the extra features this had while I’m at it.

On Wed, Feb 3, 2021 at 20:39 Ce Gao [email protected] wrote:

cc @johnugeorge https://github.com/johnugeorge

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kubeflow/pytorch-operator/issues/317#issuecomment-772959107, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALPFPIZN4QYEDGNE5545SMTS5H3FVANCNFSM4XBQ6VSQ .

ca-scribner avatar Feb 04 '21 03:02 ca-scribner

I started looking at refreshing this, but I can't reproduce the existing API. I'm following the /hack/python-sdk scripts, but when I Generate the Python SDK (java -jar ${SWAGGER_CODEGEN_JAR} ...) I get an incomplete API. See below image (left is what repo has, right is what I get when rebuilding):

pytorch-operator-compare

I've tried doing this both from current master and 61fefa88f75b126fd7672f44b87351db511299cb but neither generates the entire pytorchjob SDK. Anyone have suggestions on what I'm missing?

ca-scribner avatar Feb 08 '21 17:02 ca-scribner

/cc @jinchihe

gaocegege avatar Feb 09 '21 01:02 gaocegege

ty!

fyi to use the PyTorchJob API (built using swagger) with kubernetes atm (built using the openapi fork or swagger), I've been subclassing like:

# Patch PyTorchJob APIs to align with k8s usage
class V1PyTorchJob(V1PyTorchJob_original):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.openapi_types = self.swagger_types

And for a pytorch_launcher where I need a dict of the PyTorchJob to pass to k8s API, .to_dict() generated by swagger has a bug where the attribute map (which remaps python-names to k8s-names) wasn't used. I've been getting around this by serializing using k8s_client.ApiClient().sanitize_for_serialization(job) rather than job.to_dict(), but maybe the new openapi tooling fixes this. Worst case, the to_dict() is easy to patch and I can provide code.

Also happy to help update these if there's anything I can take off your plate!

ca-scribner avatar Feb 13 '21 17:02 ca-scribner

Hi @jinchihe Is there any progress on this?

umka1332 avatar Sep 14 '21 17:09 umka1332