mltrace
mltrace copied to clipboard
Coarse-grained lineage and tracing for machine learning pipelines.
Right now, if users log input and output vars that resolve as non string types, the behavior of adding to the db is undefined.
Currently, we can only auto_log dataframes and variables that have "data" or "model" in the name. We also want to support: - [ ] jsons / dictionaries - [ ]...
We don't want the user to have to manually log metrics or params, especially if they are already printing them out. We will also have to visualize these in the...
For the components steps, MlTrace can follow/implement the OpenLineage Spec https://github.com/OpenLineage/OpenLineage/blob/main/spec/OpenLineage.md. - Offer an endpoint to allow Metadata Engines (DataHub or Marquez or Amunsden) to collect Traces. - Record versionned...
For example, Tracing API spec allow any third party to ingest the produced Tracing Entities in a standard manner: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md - Have a schema to produce and ingest Entities -...
Can call mltrace.save and mltrace.load on these vars and log the resulting pathnames
Context: the current `IOPointer` abstraction only stores a string "pointer" to the data, or a key. An example might be `features.csv` or `model.joblib`. This means we currently can't do anything...