client
client copied to clipboard
Data Engine: Autolog to MLflow on datasource queries
Implemented in this PR:
- Added a link to open the query in the gallery view
- Added autologging only on ds.all() of queries to MLflow
- The logging happens in a parallel thread so as to not interfere with the querying
- Querying works with multiple datasources, correctly saving the query for each ds.
- A new run is created in the last modified experiment per datasource that's being queried, all the queries are then logged into that run.
Left to implement:
- RIght now the artifacts all get logged with the same name, so they are being overwritten every time. I need to add a tracker for how many logs has been saved.
- Probably would be a good idea to have the autologging togglable in some ways, because the runs created are not being closed and are going to pollute the environment, so I expect advanced users to be able to turn this off.
Caveats/bugs:
- Closing a run doesn't work if there's a logging session going on to another repo's MLflow. Possible solution - if the current datasource is not in the active run's repo, log in foreground instead of background.
Changes to be done:
- Log ONLY to the active run, no point in creating a run on the datasource's repo as it turns out
- Come up with a better naming in these cases (probably log the name of the repo+datasource+index), would be good if the index could be gotten from mlflow also.
Changed it so it logs to the current one.
The format is autolog_<datasource_name>_<#>.dagshub.json
. Decided that adding the repo is a bit too verbose.
Example: