aaw icon indicating copy to clipboard operation
aaw copied to clipboard

Kubeflow Pipelines is setting `sidecar.istio.io/inject: false` annotation on workflow pods

Open zachomedia opened this issue 3 years ago • 14 comments

Kubeflow Pipelines is setting sidecar.istio.io/inject: false annotation on workflow pods. This will be blocked as part of the Protected B security configuration as we enforce all user pods be on the service mesh.

zachomedia avatar Jun 23 '21 14:06 zachomedia

I'm removing the blocker on this, I incorrectly applied to Istio configuration policy to all pods instead of just Protected B pods.

zachomedia avatar Jun 23 '21 18:06 zachomedia

Would also close https://github.com/StatCan/daaas/issues/411

blairdrummond avatar Jun 23 '21 18:06 blairdrummond

@zachomedia I am noticing that Pipelines, despite my attempts to fix the network policies, are still not able to connect to Vault. Any chance that this would be the cause?

# Take a look at the vault-agent logs
kubectl logs -f -n blair-drummond estimate-pi-5v8sx-2757349930 vault-agent
# Take a look at the network policy
kubectl get networkpolicy -n blair-drummond notebooks-vault-egress -o yaml

Best I can tell, only obvious difference between the notebooks and the workflows is that the workflows are not on the mesh?

CC @sylus

blairdrummond avatar Oct 06 '21 01:10 blairdrummond

I think this is now resolved!

sylus avatar Nov 22 '21 15:11 sylus

This appears to still be an issue in Prod

blairdrummond avatar Dec 03 '21 20:12 blairdrummond

Creating Argo Workflows manually does not have this problem. This inject=false is added by kubeflow pipelines.

blairdrummond avatar Dec 07 '21 14:12 blairdrummond

Here it is. Looks like its a KFP setting now

https://github.com/kubeflow/pipelines/blob/fef8c03e401a15a9f92c1839fe0f9a5c22f709e1/manifests/kustomize/base/installs/multi-user/pipelines-profile-controller/sync.py#L73-L75

blairdrummond avatar Dec 07 '21 14:12 blairdrummond

We're seeing this in our pipelines, so I captured the latest error message to highlight the problem:

time="2021-12-10T12:51:50.542Z" level=info msg="capturing logs" argo=true
Traceback (most recent call last):
  File "/pipelines/preprocessing.py", line 24, in <module>
    import config as acm
  File "/pipelines/config.py", line 51, in <module>
    settings = Settings()
  File "pydantic/env_settings.py", line 37, in pydantic.env_settings.BaseSettings.__init__
  File "pydantic/env_settings.py", line 63, in pydantic.env_settings.BaseSettings._build_values
  File "/pipelines/config.py", line 24, in json_config_settings_source
    return json.loads(settings.__config__.json_settings_path.read_text())
  File "/usr/lib/python3.8/pathlib.py", line 1236, in read_text
    with self.open(mode='r', encoding=encoding, errors=errors) as f:
  File "/usr/lib/python3.8/pathlib.py", line 1222, in open
    return io.open(self, mode, buffering, encoding, errors, newline,
  File "/usr/lib/python3.8/pathlib.py", line 1078, in _opener
    return self._accessor.open(self, flags, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/vault/secrets/minio-standard-tenant-1.json'

Same code works fine from the terminal in jupyter.

goatsweater avatar Dec 10 '21 12:12 goatsweater

@goatsweater Can we get the

Pod name Namespace Maybe the pod yaml spec

Also the vault file is loaded asyncronously, so you might need a wait/retry

CC @jumana-s

blairdrummond avatar Dec 10 '21 13:12 blairdrummond

The failing pod is the nrcan-btap namespace, and it's pod btap-pipeline-lxb5x-3464483345 that I'm seeing the error on.

I did retry, the result of which is the error pasted above. Retrying again right now tells me no nodes are available, so will have to wait for something to free up to see the error again.

goatsweater avatar Dec 10 '21 13:12 goatsweater

@zachomedia what is left for this issue?

sylus avatar Jan 24 '22 16:01 sylus

Reassess this once KF 1.3 is deployed. @blairdrummond says this should be configurable via environment in newer versions.

brendangadd avatar Mar 21 '22 15:03 brendangadd

Please confirm that this is fixed and will not be re-introduced in KF 1.6

chuckbelisle avatar Sep 21 '22 14:09 chuckbelisle

Maybe we can call this fixed?

The BTAP project moved on and isn't running the same pipeline anymore, so the problem isn't affecting it anymore. I don't have any other use cases that are impacted either.

goatsweater avatar Sep 26 '22 20:09 goatsweater

Since Kubeflow pipelines are being removed on AAW I think this issue can be closed.

StanHatko avatar Feb 27 '23 18:02 StanHatko