Split NodePerformanceProfile state storage into separate mappings
Motivation
The monolithic NodePerformanceProfile stored all node profile data together, but the data comes from different events with different update frequencies:
- Identity (model_id, chip_id, friendly_name) - updated every 30s
- Memory - updated every 0.5s
- System (GPU, temp, power) - updated every 1s
- Network interfaces - updated every 30s
Storing all this in a single mapping meant every update replaced the entire profile object. This refactor splits the storage to match the event structure, making updates more efficient and the code cleaner.
Dashboard responsiveness: This makes the dashboard much more responsive - we can immediately show the device name, type, and memory as soon as those events arrive, while slower metrics like temperature and GPU utilization follow shortly after. Previously, we had to wait for all metrics before displaying anything useful.
Prerequisite for memory bandwidth profiling: This is also a necessary prerequisite for adding memory bandwidth profiling, which is quite slow to measure and would block other metrics if bundled together.
Changes
Python State:
- Added
NodeIdentityclass toprofiling.pywithmodel_id,chip_id,friendly_name - Replaced
node_profiles: Mapping[NodeId, NodePerformanceProfile]instate.pywith four separate mappings:-
node_identities: Mapping[NodeId, NodeIdentity] -
node_memories: Mapping[NodeId, MemoryPerformanceProfile] -
node_systems: Mapping[NodeId, SystemPerformanceProfile] -
node_networks: Mapping[NodeId, list[NetworkInterfaceInfo]]
-
- Rewrote apply functions in
apply.pyto write to their specific storage - Added
_reconstruct_profile()helper to rebuildNodePerformanceProfilefor topology updates - Updated
api.pymemory calculation to usestate.node_memoriesdirectly
Worker Polling:
- Changed
emit_identity_metricstostart_polling_identity_metrics- now polls every 30s instead of emitting once - All metrics now follow the same pattern: emit immediately, then poll periodically
Dashboard:
- Removed
RawNodeProfileinterface entirely - Added split state interfaces:
RawNodeIdentity,RawNodeMemory,RawNodeSystem,RawNetworkInterface - Updated
RawStateResponseto include split fields - Simplified
transformTopology()to use split types directly with safe defaults for missing data
Why It Works
Each apply function now updates only its specific mapping, and the topology still gets a reconstructed full profile for placement logic compatibility. The dashboard gracefully handles partial data (e.g., if memory hasn't arrived yet, it defaults to 0). All metrics are emitted immediately on startup and then polled periodically.
Test Plan
Manual Testing
Automated Testing
- Type checker passes:
uv run basedpyright- 0 errors - All tests pass:
uv run pytest- 151 passed - Dashboard builds successfully:
npm run build