aaw
aaw copied to clipboard
Kubeflow Pipelines is setting `sidecar.istio.io/inject: false` annotation on workflow pods
Kubeflow Pipelines is setting sidecar.istio.io/inject: false
annotation on workflow pods. This will be blocked as part of the Protected B security configuration as we enforce all user pods be on the service mesh.
I'm removing the blocker on this, I incorrectly applied to Istio configuration policy to all pods instead of just Protected B pods.
Would also close https://github.com/StatCan/daaas/issues/411
@zachomedia I am noticing that Pipelines, despite my attempts to fix the network policies, are still not able to connect to Vault. Any chance that this would be the cause?
# Take a look at the vault-agent logs
kubectl logs -f -n blair-drummond estimate-pi-5v8sx-2757349930 vault-agent
# Take a look at the network policy
kubectl get networkpolicy -n blair-drummond notebooks-vault-egress -o yaml
Best I can tell, only obvious difference between the notebooks and the workflows is that the workflows are not on the mesh?
CC @sylus
I think this is now resolved!
This appears to still be an issue in Prod
Creating Argo Workflows manually does not have this problem. This inject=false is added by kubeflow pipelines.
Here it is. Looks like its a KFP setting now
https://github.com/kubeflow/pipelines/blob/fef8c03e401a15a9f92c1839fe0f9a5c22f709e1/manifests/kustomize/base/installs/multi-user/pipelines-profile-controller/sync.py#L73-L75
We're seeing this in our pipelines, so I captured the latest error message to highlight the problem:
time="2021-12-10T12:51:50.542Z" level=info msg="capturing logs" argo=true
Traceback (most recent call last):
File "/pipelines/preprocessing.py", line 24, in <module>
import config as acm
File "/pipelines/config.py", line 51, in <module>
settings = Settings()
File "pydantic/env_settings.py", line 37, in pydantic.env_settings.BaseSettings.__init__
File "pydantic/env_settings.py", line 63, in pydantic.env_settings.BaseSettings._build_values
File "/pipelines/config.py", line 24, in json_config_settings_source
return json.loads(settings.__config__.json_settings_path.read_text())
File "/usr/lib/python3.8/pathlib.py", line 1236, in read_text
with self.open(mode='r', encoding=encoding, errors=errors) as f:
File "/usr/lib/python3.8/pathlib.py", line 1222, in open
return io.open(self, mode, buffering, encoding, errors, newline,
File "/usr/lib/python3.8/pathlib.py", line 1078, in _opener
return self._accessor.open(self, flags, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/vault/secrets/minio-standard-tenant-1.json'
Same code works fine from the terminal in jupyter.
@goatsweater Can we get the
Pod name Namespace Maybe the pod yaml spec
Also the vault file is loaded asyncronously, so you might need a wait/retry
CC @jumana-s
The failing pod is the nrcan-btap
namespace, and it's pod btap-pipeline-lxb5x-3464483345
that I'm seeing the error on.
I did retry, the result of which is the error pasted above. Retrying again right now tells me no nodes are available, so will have to wait for something to free up to see the error again.
@zachomedia what is left for this issue?
Reassess this once KF 1.3 is deployed. @blairdrummond says this should be configurable via environment in newer versions.
Please confirm that this is fixed and will not be re-introduced in KF 1.6
Maybe we can call this fixed?
The BTAP project moved on and isn't running the same pipeline anymore, so the problem isn't affecting it anymore. I don't have any other use cases that are impacted either.
Since Kubeflow pipelines are being removed on AAW I think this issue can be closed.