zenml icon indicating copy to clipboard operation
zenml copied to clipboard

[BUG]: Leaking secret experiment tracker URI

Open mbspng opened this issue 1 year ago • 2 comments

Contact Details [Optional]

No response

System Information

ZENML_LOCAL_VERSION: 0.55.1
ZENML_SERVER_VERSION: 0.55.1
ZENML_SERVER_DATABASE: sqlite
ZENML_SERVER_DEPLOYMENT_TYPE: local
ZENML_CONFIG_DIR: /home/mbs/.config/zenml
ZENML_LOCAL_STORE_DIR: /home/mbs/.config/zenml/local_stores
ZENML_SERVER_URL: http://127.0.0.1:8237
ZENML_ACTIVE_REPOSITORY_ROOT: None
PYTHON_VERSION: 3.10.13
ENVIRONMENT: native
SYSTEM_INFO: {'os': 'linux', 'linux_distro': 'ubuntu', 'linux_distro_like': 'debian', 'linux_distro_version': '22.04'}
ACTIVE_WORKSPACE: default
ACTIVE_STACK: secured-stack
ACTIVE_USER: default
TELEMETRY_STATUS: enabled
ANALYTICS_CLIENT_ID: 8311f72e-e306-4e9f-b8ee-f023a6f3834f
ANALYTICS_USER_ID: eb756b12-3e37-4c9e-983b-01f559c3366e
ANALYTICS_SERVER_ID: 8311f72e-e306-4e9f-b8ee-f023a6f3834f
INTEGRATIONS: ['kaniko', 'mlflow', 'pillow', 'scipy', 'sklearn']
PACKAGES: {'jinja2': '3.1.3', 'markupsafe': '2.1.4', 'pyyaml': '6.0.1', 'arrow': '1.3.0', 'binaryornot': '0.4.4', 'certifi': '2024.2.2', 'chardet': '5.2.0', 'charset-normalizer': '3.3.2', 
'cookiecutter': '2.5.0', 'idna': '3.6', 'markdown-it-py': '3.0.0', 'mdurl': '0.1.2', 'pygments': '2.17.2', 'python-dateutil': '2.8.2', 'python-slugify': '8.0.3', 'requests': '2.31.0', 'rich': '13.7.0',
'six': '1.16.0', 'text-unidecode': '1.3', 'types-python-dateutil': '2.8.19.20240106', 'urllib3': '2.2.0', 'brotli': '1.1.0', 'gitpython': '3.1.41', 'mako': '1.3.2', 'markdown': '3.5.2', 'pyjwt': 
'2.7.0', 'pymysql': '1.0.3', 'sqlalchemy': '1.4.41', 'sqlalchemy-utils': '0.38.3', 'adlfs': '2024.1.0', 'aiofiles': '23.2.1', 'aiohttp': '3.9.3', 'aiohttp-retry': '2.8.3', 'aiokafka': '0.10.0', 
'aiosignal': '1.3.1', 'alembic': '1.8.1', 'amqp': '5.2.0', 'antlr4-python3-runtime': '4.9.3', 'anyio': '4.2.0', 'appdirs': '1.4.4', 'argcomplete': '3.2.2', 'astroid': '3.0.3', 'asttokens': '2.4.1', 
'async-timeout': '4.0.3', 'asyncssh': '2.14.2', 'atpublic': '4.0', 'attrs': '23.2.0', 'azure-common': '1.1.28', 'azure-core': '1.30.0', 'azure-datalake-store': '0.0.53', 'azure-identity': '1.15.0', 
'azure-mgmt-core': '1.4.0', 'azure-mgmt-resource': '23.0.1', 'azure-storage-blob': '12.19.0', 'bcrypt': '4.0.1', 'billiard': '4.2.0', 'black': '24.1.1', 'blinker': '1.7.0', 'cachetools': '5.3.2', 
'celery': '5.3.6', 'cffi': '1.16.0', 'click': '8.1.3', 'click-didyoumean': '0.3.0', 'click-params': '0.3.0', 'click-plugins': '1.1.1', 'click-repl': '0.3.0', 'cloudpickle': '2.2.1', 'colorama': 
'0.4.6', 'comm': '0.2.1', 'configobj': '5.0.8', 'contourpy': '1.2.0', 'cryptography': '42.0.2', 'cycler': '0.12.1', 'databricks-cli': '0.18.0', 'decorator': '5.1.1', 'dictdiffer': '0.9.0', 'dill': 
'0.3.8', 'diskcache': '5.6.3', 'distro': '1.9.0', 'docker': '6.1.3', 'dpath': '2.1.6', 'dulwich': '0.21.7', 'dvc': '3.43.1', 'dvc-azure': '3.0.1', 'dvc-data': '3.9.0', 'dvc-http': '2.32.0', 
'dvc-objects': '3.0.6', 'dvc-render': '1.0.1', 'dvc-studio-client': '0.18.0', 'dvc-task': '0.3.0', 'dynaconf': '3.2.4', 'entrypoints': '0.4', 'exceptiongroup': '1.2.0', 'executing': '2.0.1', 'fastapi':
'0.89.1', 'fastapi-utils': '0.2.1', 'filelock': '3.13.1', 'fire': '0.5.0', 'flake8': '7.0.0', 'flake8-pyproject': '1.2.3', 'flask': '3.0.2', 'flatten-dict': '0.4.2', 'flufl.lock': '7.1.1', 'fonttools':
'4.48.1', 'frozenlist': '1.4.1', 'fsspec': '2023.12.2', 'funcy': '2.0', 'gevent': '23.9.1', 'geventhttpclient': '2.0.2', 'gitdb': '4.0.11', 'grandalf': '0.8', 'greenlet': '3.0.3', 'grpcio': '1.60.1', 
'gto': '1.6.2', 'gunicorn': '21.2.0', 'h11': '0.14.0', 'httplib2': '0.19.1', 'httptools': '0.6.1', 'hydra-core': '1.3.2', 'importlib-metadata': '7.0.1', 'importlib-resources': '6.1.1', 'iniconfig': 
'2.0.0', 'ipinfo': '5.0.1', 'ipython': '8.21.0', 'ipywidgets': '8.1.1', 'isodate': '0.6.1', 'isort': '5.13.2', 'iterative-telemetry': '0.0.8', 'itsdangerous': '2.1.2', 'jedi': '0.19.1', 'jmespath': 
'1.0.1', 'joblib': '1.3.2', 'jupyterlab-widgets': '3.0.9', 'kiwisolver': '1.4.5', 'knack': '0.11.0', 'kombu': '5.3.5', 'llvmlite': '0.42.0', 'loguru': '0.7.2', 'matplotlib': '3.8.2', 
'matplotlib-inline': '0.1.6', 'mccabe': '0.7.0', 'mlflow': '2.10.1', 'mlserver': '1.3.5', 'mlserver-mlflow': '1.3.5', 'msal': '1.26.0', 'msal-extensions': '1.1.0', 'multidict': '6.0.5', 
'mypy-extensions': '1.0.0', 'networkx': '3.2.1', 'numba': '0.59.0', 'numpy': '1.26.3', 'oauthlib': '3.2.2', 'omegaconf': '2.3.0', 'orjson': '3.8.14', 'packaging': '23.2', 'pandas': '2.2.0', 'parso': 
'0.8.3', 'passlib': '1.7.4', 'pathspec': '0.12.1', 'pexpect': '4.9.0', 'pillow': '10.2.0', 'pip': '23.3.1', 'platformdirs': '3.11.0', 'pluggy': '1.4.0', 'portalocker': '2.8.2', 'prometheus-client': 
'0.19.0', 'prompt-toolkit': '3.0.43', 'protobuf': '4.25.2', 'psutil': '5.9.8', 'ptyprocess': '0.7.0', 'pure-eval': '0.2.2', 'py-grpc-prometheus': '0.7.0', 'pyarrow': '15.0.0', 'pycodestyle': '2.11.1', 
'pycparser': '2.21', 'pydantic': '1.10.14', 'pydot': '1.4.2', 'pyflakes': '3.2.0', 'pygit2': '1.14.0', 'pygtrie': '2.5.0', 'pylint': '3.0.3', 'pymssql': '2.2.11', 'pyparsing': '2.4.7', 'pytest': 
'8.0.0', 'python-dotenv': '1.0.1', 'python-multipart': '0.0.7', 'python-rapidjson': '1.14', 'pytz': '2023.4', 'querystring-parser': '1.2.4', 'randomname': '0.2.1', 'ruamel.yaml': '0.18.5', 
'ruamel.yaml.clib': '0.2.8', 'ruff': '0.2.0', 'scikit-learn': '1.4.0', 'scipy': '1.12.0', 'scmrepo': '2.1.1', 'semver': '3.0.2', 'setuptools': '69.0.3', 'shap': '0.44.1', 'shortuuid': '1.0.11', 
'shtab': '1.6.5', 'slicer': '0.0.7', 'smmap': '5.0.1', 'sniffio': '1.3.0', 'sqlalchemy2-stubs': '0.0.2a38', 'sqlmodel': '0.0.8', 'sqlparse': '0.4.4', 'sqltrie': '0.11.0', 'stack-data': '0.6.3', 
'starlette': '0.22.0', 'starlette-exporter': '0.17.1', 'tabulate': '0.9.0', 'termcolor': '2.4.0', 'threadpoolctl': '3.2.0', 'tomli': '2.0.1', 'tomlkit': '0.12.3', 'tqdm': '4.66.1', 'traitlets': 
'5.14.1', 'tritonclient': '2.42.0', 'typer': '0.9.0', 'typing-extensions': '4.9.0', 'tzdata': '2023.4', 'uvicorn': '0.27.0.post1', 'uvloop': '0.19.0', 'validators': '0.18.2', 'vine': '5.1.0', 
'voluptuous': '0.14.1', 'watchfiles': '0.21.0', 'wcwidth': '0.2.13', 'websocket-client': '1.7.0', 'websockets': '12.0', 'werkzeug': '3.0.1', 'wheel': '0.41.2', 'widgetsnbextension': '4.0.9', 'yarl': 
'1.9.4', 'zc.lockfile': '3.0.post1', 'zemml-test': '0.1.0', 'zenml': '0.55.1', 'zenml-eval': '0.1.0', 'zenmlhub-mlflow-steps': '0.1', 'zipp': '3.17.0', 'zope.event': '5.0', 'zope.interface': '6.1'}

CURRENT STACK

Name: secured-stack
ID: b2b5796f-3271-4f7b-9193-4728ad82ed74
User: default / eb756b12-3e37-4c9e-983b-01f559c3366e
Workspace: default / f146146e-1e88-401b-9ae0-a10fb7dd4975

ORCHESTRATOR: default

Name: default
ID: b35bc525-05ce-4f67-92b2-e541e2c442de
Type: orchestrator
Flavor: local
Configuration: {}
Workspace: default / f146146e-1e88-401b-9ae0-a10fb7dd4975

ARTIFACT_STORE: default

Name: default
ID: 44ecb1aa-9c2b-49b7-98b7-62b451ab7dce
Type: artifact_store
Flavor: local
Configuration: {'path': ''}
Workspace: default / f146146e-1e88-401b-9ae0-a10fb7dd4975

EXPERIMENT_TRACKER: secured-mlflow

Name: secured-mlflow
ID: 9e063771-8b04-49b3-84b1-00df10c4bb8b
Type: experiment_tracker
Flavor: mlflow
Configuration: {'experiment_name': None, 'nested': False, 'tags': {}, 'tracking_uri': None, 'tracking_username': '********', 'tracking_password': '********', 'tracking_token': '********', 
'tracking_insecure_tls': False, 'databricks_host': None}
User: default / eb756b12-3e37-4c9e-983b-01f559c3366e
Workspace: default / f146146e-1e88-401b-9ae0-a10fb7dd4975

What happened?

The MLflow experiment tracker leaks secret URIs in the step metadata. When using a URI with username and password in it to connect to the backend-store (e.g. SQL DB) this leaks the login data.

The relevant ZenML source code responsible is this:

def get_step_run_metadata(self, info: "StepRunInfo") -> Dict[str, "MetadataType"]:
    return {
        METADATA_EXPERIMENT_TRACKER_URL: Uri(self.get_tracking_uri()),
        "mlflow_run_id": mlflow.active_run().info.run_id,
        "mlflow_experiment_id": mlflow.active_run().info.experiment_id,
    }
        
def get_tracking_uri(self) -> str:
    return self.config.tracking_uri or self._local_mlflow_backend()

Reproduction steps

  1. Define an experiment tracker using secrets:
zenml stack set default
zenml secret create mlflow_secret \
    --username=$DYNACONF_TRACKING_USERNAME \
    --password=$DYNACONF_TRACKING_PASSWORD \
    --experiment_tracker_url=$DYNACONF_TRACKING_URI

zenml experiment-tracker register secured-mlflow \
    --flavor=mlflow \
    --tracking_username={{mlflow_secret.username}} \
    --tracking_password={{mlflow_secret.password}} \
    --tracking_uri={{mlflow_secret.experiment_tracker_url}}
zenml stack register -e secured-mlflow -a default -o default secured-stack
zenml stack set secured-stack
  1. Run a pipeline using that tracker
  2. Go to the web UI
  3. Go to the last run
  4. click on the step in the DAG that uses the experiment tracker
  5. Observe that the field experiment_tracker_url under Metadata for the step shows in plain text the value of the secret URI.

Relevant log output

No response

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

mbspng avatar Feb 16 '24 12:02 mbspng

Hey @mbspng , thanks for reporting this! I'll take care from here.

avishniakov avatar Feb 21 '24 14:02 avishniakov

Hi @mbspng , I prepared a fix for this, once merged and released issue should be gone.

Worth mentioning that if you need more granular roles and permissions, consider exploring ZenML Cloud offering, since it has reached RBAC concept in place.

avishniakov avatar Feb 21 '24 15:02 avishniakov

@mbspng this has been released now, so closing the issue.

strickvl avatar Mar 15 '24 10:03 strickvl