hamilton icon indicating copy to clipboard operation
hamilton copied to clipboard

Hamilton UI - Configuration menu for clean up and tracked execution infos

Open legout opened this issue 1 year ago • 7 comments

Updated issue:

  1. A user cannot delete data easily from the Hamilton UI without going to the database manually.

Proposed solution:

  • Expose server endpoint and UI view to delete projects, and runs; expose SDK functionality to do it programmatically.

Alternatives:

  • #1232 helps mitigate the problem by enabling more configuration driven options to determine what is or is not stored.

---- Original Issue: --- Is your feature request related to a problem? Please describe. Currently, the Hamilton Tracker/UI saves a lot of information for every dataflow run, like success info, node parameters (inputs), node results(outputs), execution times,...

For ETL dataflows, that runs often and processes a lot of data during every run, this results in a lot of data stored in the postgres/sqlite db (might be a problem) and the RUNS section in the UI becomes unresponsive.

Personally, I am most interested in the success info, execution times and logs.

Describe the solution you'd like

  1. Configuration of regular cleanups for the tracker data of each run. Ideally, this can be done individual for every data type (success info, inputs, outputs, logs,...)
  2. Configuration of which kind of data should be stored by the tracker.

legout avatar Nov 13 '24 07:11 legout

Thank you for opening the issue. Following the discussion on Slack, I know @skrawcz reproduced the issue yesterday. We're working on a fix!

Tagging this related issue: #921

zilto avatar Nov 13 '24 16:11 zilto

CC @legout https://github.com/DAGWorks-Inc/hamilton/pull/1232 is a start. Let me know if you have comments. Should be able to build off of this and add more over time...

skrawcz avatar Nov 18 '24 17:11 skrawcz

(closing for now; feel free to reopen!)

zilto avatar Nov 26 '24 01:11 zilto

I'm going to reopen and modify the issue to focus on exposing endpoints in the server and UI to delete data.

skrawcz avatar Nov 26 '24 04:11 skrawcz

@skrawcz This slack thread is related

https://hamilton-opensource.slack.com/archives/C03M33QB4M8/p1739280812916689

legout avatar Feb 13 '25 09:02 legout

Yeah -- should be easy enough to delete all db items below a certain date. Extra requirements:

  1. Make idempotent -- if it fails, we should be able to run again
  2. (ideally) -- task-based -- likely going to be blocking for now as we don't have task-based capabilities so it's a bit more complex, but will see if django offers something out of the box
  3. (ideally) -- track prior calls/cleanups -- again, not for the first release but we can easily have a UI page + an endpoint for prior cleanup jobs

elijahbenizzy avatar Feb 13 '25 15:02 elijahbenizzy

To add -- the NodeRunAttribute object stores most of the data, but you'd want to get rid of DAG runs, node runs, and attributes: https://github.com/DAGWorks-Inc/hamilton/blob/1057a40db370d235fce647d65872221365da6bfa/ui/backend/server/trackingserver_run_tracking/models.py

Maybe even templates...

Should be easiest to have an endpoint, could also have a script that you run.

elijahbenizzy avatar Feb 14 '25 00:02 elijahbenizzy