RayGraphAdapter incorrect telemetry output to Hamilton UI
HamiltonTracker logs wrong execution telemetry if run with RayGraphAdapter.
Current behaviour
- Individual node execution tracks as immediate.
- During Error the run does not display failure / shows all nodes executed correctly.
Steps to replicate behavior
import pandas as pd
import time
def node_5s()->float:
start = time.time()
time.sleep(5)
return time.time() - start
def node_5s_error()->float:
start = time.time()
time.sleep(5)
raise ValueError("Does not break telemetry if executed through ray")
return time.time() - start
if __name__ == "__main__":
import __main__
from hamilton import base, driver
from hamilton.plugins.h_ray import RayGraphAdapter
from hamilton_sdk import adapters
import ray
username = 'jernejfrank'
tracker_ray = adapters.HamiltonTracker(
project_id=1, # modify this as needed
username=username,
dag_name="ray_telemetry_bug",
)
try:
ray.init()
rga = RayGraphAdapter(result_builder=base.PandasDataFrameResult())
dr_ray = ( driver.Builder()
.with_modules(__main__)
.with_adapters(rga,tracker_ray)
.build()
)
result_ray = dr_ray.execute(final_vars=['node_5s','node_5s_error'])
print(result_ray)
ray.shutdown()
except ValueError:
print("UI displays no problem")
finally:
tracker = adapters.HamiltonTracker(
project_id=1, # modify this as needed
username=username,
dag_name="telemetry_okay",
)
dr_without_ray = ( driver.Builder()
.with_modules(__main__)
.with_adapters(tracker)
.build()
)
result_without_ray = dr_without_ray.execute(final_vars=['node_5s','node_5s_error'])
Library & System Information
- sf-hamilton:main
- ray 2.34.0
- Python 3.9 and 3.10
- Linux Ubuntu 22.04
- MacOS Ventura 13.6.7
Expected behavior
Same as without RayGraphAdapter
Additional context
Happy to work on that, but I will need support.
@jernejfrank thanks for the issue. Currently this is expected behavior for use with the RayGraphAdapter (the HamiltonTracker works when using Parallel / Collect + RayTaskExecutor).
So we have all the tools, it's just the wiring that needs to be set up. There's a few approaches we could take. Let @elijahbenizzy & myself sketch some options out and get back to you. I assume this is a bit of a blocker for you to use the UI then?
@skrawcz ah, okay - didn't know about RayTaskExecutor and haven't had the time/need to touch the parallel module.
Our pipelines are pretty split up and only one of them uses Ray so far. Not complete blocker since the UI will be handy on the other pipelines, but definitely looking to have that up an running with Ray as well (since we are planning to upgrade other pipelines to Ray).
Cool, let me know when you have something in mind. Like I said, happy to contribute.
@jernejfrank yeah so we think we have a path forward:
- We need to add a new lifecycle API method that is something like
do_remote_execute. - This will then create a wrapper function that will pass through adapters to be used around the function to be executed, e.g. enabling the pass through of the HamiltonTracker to execute along with the function that is being remotely executed.
- We'd then need to mess with a few of the internals a little to make this work.
So plan is to sketch this out in a PR - depending on the details will tag you for something to contribute to where it makes sense.
Hey @jernejfrank -- we outlined in detail what would be involved -- it's nothing too complicated but you get a bit of a tour of Hamilton's inner workings. Feel free to reach out if you want to pair to get started.
https://github.com/DAGWorks-Inc/hamilton/pull/1097/files
Hi @elijahbenizzy , awesome! Super excited to look under the hood.
Let me poke around a bit over the weekend and maybe we can meet Monday or Tuesday to make it more productive? I'm based in the UK and available after 2pm GMT, let me know if it fits in your schedule.
Hi @elijahbenizzy , awesome! Super excited to look under the hood.
Let me poke around a bit over the weekend and maybe we can meet Monday or Tuesday to make it more productive? I'm based in the UK and available after 2pm GMT, let me know if it fits in your schedule.
Sounds good! Will reach out on slack to find a time.