hamilton icon indicating copy to clipboard operation
hamilton copied to clipboard

`UX` Hamilton Project

Open zilto opened this issue 9 months ago • 0 comments

Current Limitations

When consulting a Python project using Hamilton, there is no way to tell which files are "Hamilton modules".

This has several implications:

  • User doesn't know what can be imported and passed to a Driver
  • User might unknowingly add functions to a module, rendering it invalid for Hamilton
  • Project and IDE tooling for Hamilton don't have a standardized / centralized way to identify Hamilton modules
  • User / tools can't know which combinations of modules can be passed together to a Driver

I touched on a similar topic in Issue #747 in the context of the CLI.

Benefits

I proposed the notion of Project (to map to Hamilton UI "project"; maybe "workspace" is better) to allow users to specify "Hamilton modules".

Features it could unlock:

LSP: multi-module features

  • code navigation. You're currently editing hello.py, but the LSP builds the dataflow with both hello.py and world.py and knows about their nodes.
  • visualization. Allow to view multiple modules in the VSCode extension instead of only current file

CLI / pre-commit / CI: apply to all

  • validate all modules. The pre-commit can attempt to build all "single" and "composed" dataflows
  • generate all visualizations. Use the CLI to generate visualizations of all modules on command or commit

Hamilton UI

  • sync catalog without execution. The UI could better separate "historical dataflows" that were executed from "available dataflows" representing the state of the current code

API design

Hamilton is designed around 2 layers: dataflow definition and dataflow execution. This API relates to dataflow definition, which requires knowing:

  • required: Python modules (file paths; one or more)
  • optional: Driver config (dict)

Given Hamilton is Python-centric, it should adopt pyproject.toml as a standard. The TOML format is also well-supported by other languages for parsing (e.g., TypeScript in VSCode extension, future Rust dev tools). The format supports the relevant types to specify the Python modules and config.

Example TOML; it provides flexibility for specifying dataflow definition

# shortform notation
[tool.hamilton]
dataflows = [
  { name = "greetings", modules = ["world.py"] },
  { modules = ["hello.py"] },  # `name` is inferred when `len(modules) == 1`
]

# longform notation
# mutually exclusive with shortform because they both use `tool.hamilton.dataflows`

[[tool.hamilton.dataflows]]  # this adds to the list `hamilton.dataflows`
modules = ["single.py"]  # `name` is inferred when `len(modules) == 1`

[[tool.hamilton.dataflows]]
name = "composed"
modules = ["a.py", "b.py"]  # list `hamilton.dataflows[i].modules[...]`

[[tool.hamilton.dataflows]]
name = "inline_config"
modules = ["a.py"]
config = { env = "dev", owner = "me" }  # mapping `hamilton.dataflows[i].config{...}`

[[tool.hamilton.dataflows]]
name = "multiline_config"
modules = ["a.py"]
config.env = "dev"  # key-value pair `hamilton.dataflows[i].config{env: "dev"}`
config.owner = "me"
config.key1 = true
config.key2 = false
config.key3 = 12345

API extensibility

Currently, we only define tool.hamilton.dataflows, but we can add more configurations.

zilto avatar May 06 '24 15:05 zilto