hamilton
hamilton copied to clipboard
`UX` Hamilton Project
Current Limitations
When consulting a Python project using Hamilton, there is no way to tell which files are "Hamilton modules".
This has several implications:
- User doesn't know what can be imported and passed to a
Driver
- User might unknowingly add functions to a module, rendering it invalid for Hamilton
- Project and IDE tooling for Hamilton don't have a standardized / centralized way to identify Hamilton modules
- User / tools can't know which combinations of
modules
can be passed together to aDriver
I touched on a similar topic in Issue #747 in the context of the CLI.
Benefits
I proposed the notion of Project
(to map to Hamilton UI "project"; maybe "workspace" is better) to allow users to specify "Hamilton modules".
Features it could unlock:
LSP: multi-module features
- code navigation. You're currently editing
hello.py
, but the LSP builds the dataflow with bothhello.py
andworld.py
and knows about their nodes. - visualization. Allow to view multiple modules in the VSCode extension instead of only current file
CLI / pre-commit / CI: apply to all
- validate all modules. The pre-commit can attempt to build all "single" and "composed" dataflows
- generate all visualizations. Use the CLI to generate visualizations of all modules on command or commit
Hamilton UI
- sync catalog without execution. The UI could better separate "historical dataflows" that were executed from "available dataflows" representing the state of the current code
API design
Hamilton is designed around 2 layers: dataflow definition and dataflow execution. This API relates to dataflow definition, which requires knowing:
- required: Python modules (file paths; one or more)
- optional: Driver config (dict)
Given Hamilton is Python-centric, it should adopt pyproject.toml
as a standard. The TOML
format is also well-supported by other languages for parsing (e.g., TypeScript in VSCode extension, future Rust dev tools). The format supports the relevant types to specify the Python modules and config.
Example TOML; it provides flexibility for specifying dataflow definition
# shortform notation
[tool.hamilton]
dataflows = [
{ name = "greetings", modules = ["world.py"] },
{ modules = ["hello.py"] }, # `name` is inferred when `len(modules) == 1`
]
# longform notation
# mutually exclusive with shortform because they both use `tool.hamilton.dataflows`
[[tool.hamilton.dataflows]] # this adds to the list `hamilton.dataflows`
modules = ["single.py"] # `name` is inferred when `len(modules) == 1`
[[tool.hamilton.dataflows]]
name = "composed"
modules = ["a.py", "b.py"] # list `hamilton.dataflows[i].modules[...]`
[[tool.hamilton.dataflows]]
name = "inline_config"
modules = ["a.py"]
config = { env = "dev", owner = "me" } # mapping `hamilton.dataflows[i].config{...}`
[[tool.hamilton.dataflows]]
name = "multiline_config"
modules = ["a.py"]
config.env = "dev" # key-value pair `hamilton.dataflows[i].config{env: "dev"}`
config.owner = "me"
config.key1 = true
config.key2 = false
config.key3 = 12345
API extensibility
Currently, we only define tool.hamilton.dataflows
, but we can add more configurations.