elyra icon indicating copy to clipboard operation
elyra copied to clipboard

no Http Proxy for Connector Download and no authentication method

Open shalberd opened this issue 2 years ago • 4 comments

https://github.com/elyra-ai/elyra/blob/66e4009ed405f45326b2ba8588c2eca9accd0f16/elyra/pipeline/airflow/package_catalog_connector/airflow_package_catalog_connector.py#L86

https://medium.com/ibm-data-ai/getting-started-with-apache-airflow-operators-in-elyra-aae882f80c4a

On the face of it, the component catalog feature is great, though I do not understand why common airflow and kubeflow pipeline components are not included in the e.g. Red Hat Operatorhub elyra image by default.

https://github.com/opendatahub-io/odh-manifests/pull/546

In most enterprise environments, as in openshift, there are often cluster-level http- and https proxies involved.

https://docs.openshift.com/container-platform/4.8/networking/enable-cluster-wide-proxy.html

I find no way to integrate apache airflow package operator catalog wheel files via a download url and proxies. For the gitlab plugin, I was able to do it via the command line.

shalberd avatar Jun 21 '22 06:06 shalberd

Sounds like a short term solution would be to include the most common AF and KF components in with the image OR predownload/upload the packages tot a location in the whitelist, and long term would be to tool in the functionality required to work with the openshift cluster level proxies e.g. username and password

I havent much experience yet using the cluster level proxy functionality in openshift. Looking that the link provided, would another short term solution by to tell the proxy to add a rule to allow bypass to the required urls (e.g. files.pythonhosted.org)? perhaps some bigger companies may have an internal mirror for packages and can use that?

akchinSTC avatar Jun 21 '22 21:06 akchinSTC

  • " like a short term solution would be to include the most common AF and KF components in with the image"

Yes, that would be great in my opinion. I am not sure how the ODH Project Image updates and people working on that project work together with you, as in this issue https://github.com/opendatahub-io/odh-manifests/pull/546, where a new Docker image was integrated into ODH.

  • "long term, the functionality required to work with the openshift cluster level proxies e.g. username and password"

Yes, like some sort of environment variable that one can pass into the container for http_proxy and https_proxy. For example, in the jupyter github / gitlab plugin, I can make it work with an enterprise proxy by setting git config --global http.proxy http://myproxy:port and it asks for an api key when cloning a repo.

  • " to tell the proxy to add a rule to allow bypass to the required urls (e.g. files.pythonhosted.org)?"

not an option in our case, we work only with either docker images in enterprise-internal repositories or with enterprise-internal package repositories like Artifactory. Those internal domains are then included in the noproxy-section of openshift cluster config.

  • " perhaps some bigger companies may have an internal mirror for packages and can use that?"

That is what I did now. I uploaded the wheel file in question to a repository in our internal artifactory. However, I believe I am not the only one faced with that issue, Chief Information Security Officers insist on some sort of authentication and non-anonymous access, either via Bearer Token or Basic Auth https://www.jfrog.com/confluence/display/JFROG/Artifactory+REST+API

@kiersten-stokes Is it possible to include Bearer Token and/or Basic Auth functionality to the airflow package catalog connector at https://github.com/elyra-ai/elyra/blob/66e4009ed405f45326b2ba8588c2eca9accd0f16/elyra/pipeline/airflow/package_catalog_connector/airflow_package_catalog_connector.py#L86 ?

In your article https://medium.com/ibm-data-ai/getting-started-with-apache-airflow-operators-in-elyra-aae882f80c4a

you mention in the section "Airflow Package Catalog Connector"

"Lastly, you’ll need to configure the Airflow package download URL. The URL must meet a few constraints:

  • it must point to a built distribution (wheel) file
  • it must reference a location that Elyra can access using an HTTP GET request without the need to authenticate"

in some sensitive enterprise environments, that is not feasible (no need to authenticate), even if using an internal package repository like Artifactory.

shalberd avatar Jun 22 '22 10:06 shalberd

  • " like a short term solution would be to include the most common AF and KF components in with the image"

I don't believe we should do this for the following reasons:

  • We can enable connectors that download resources from the web to optionally accept credentials - in theory these should be minor changes.
  • If we pre-package resources (such as components) in container images, it's not trivial to remove them in subsequent releases, as users might rely on them being present by default. We've encountered this with the system-owned runtime images, which we have deprecated and can't remove until the next major release (4.0) because it is considered a breaking change.
  • In general we are trying to include less in our base images to keep them more flexible. Users can always customize images to meet their specific requirements.

@kiersten-stokes Is it possible to included Bearer Token and/or Basic Auth functionality to the airflow package catalog connector at ... in some sensitive enterprise environments, that is not feasible (no need to authenticate), even if using an internal package repository like Artifactory.

Understood. Adding optional basic authentication should not be a problem.

ptitzler avatar Jun 22 '22 11:06 ptitzler

@ptitzler Thank you regarding the enhancement with authentication.

@akchinSTC Regarding optional proxy handling: I found out the following for openshift: "The cluster-wide proxy configuration cascades to OpenShift-managed resources only. Proxy configuration for user workloads is handled as part of application management."

source: https://access.redhat.com/solutions/5251461

So, in any case, best practice would be to have (an) optional env variable/s HTTP_PROXY and HTTPS_PROXY that can passed to the deployment env section via e.g. configmaps on OpenShift. in our case.

On your side, I believe the relevant section is here

https://github.com/akchinSTC/odh-manifests/blob/v3.6.0/jupyterhub/notebook-images/overlays/build/elyra-notebook-buildconfig.yaml

respectively ODH in the past

https://github.com/opendatahub-io/odh-manifests/pull/546

shalberd avatar Jun 22 '22 13:06 shalberd

Basic auth support is now there and working, closing issue. private PKI CA-bundle trust handled in https://github.com/elyra-ai/elyra/issues/2787. Http-proxy support will be handled later in order to not overload this issue.

shalberd avatar Sep 02 '22 18:09 shalberd