feat: events/actions context for Max
[!IMPORTANT] đ Stay up-to-date with PostHog coding conventions for a smoother review.
Problem
This PR adds a UI context to Max's root prompt, but only supports insights and dashboard. We want to extend it with actions and events.
Requires the original UI Context PR to be merged first.
Changes
Frontend:
- Add events and actions support to
maxContextLogicand related UI tags
Backend:
- Support injected actions in the
Ragnode - Support injected events in the
TaxonomyAgentnode - Add events and actions context to the
Rootprompt
Did you write or update any docs for this change?
- [ ] I've added or updated the docs
- [ ] I've reached out for help from the docs team
- [ ] No docs needed for this change
How did you test this code?
Wrote tests
đ§ AI eval results
Evaluated 6 experiments, comprising 16 metrics.
funnel
đ plan_correctness: 92.16% đ query_and_plan_alignment: 84.15% đ time_range_relevancy: 97.55%
Avg. case performance: âąī¸ 73.53 s, đĸ 6192 tokens, đĩ $0.0162 in tokens
memory
đ ToolRelevance: 98.61% đ memory_content_relevance: 90.00%
Avg. case performance: âąī¸ 6.85 s, đĸ 1216 tokens, đĩ $0.0034 in tokens
retention
đ plan_correctness: 72.00% đ query_and_plan_alignment: 60.71% đ time_range_relevancy: 95.00%
Avg. case performance: âąī¸ 35.15 s, đĸ 6213 tokens, đĩ $0.0163 in tokens
root
đ ToolRelevance: 58.35%
Avg. case performance: âąī¸ 6.35 s, đĸ 0 tokens
sql
đ plan_correctness: 48.06% đ query_and_plan_alignment: 73.50% đ sql_syntax_correctness: 95.45% đ time_range_relevancy: 97.73%
Avg. case performance: âąī¸ 36.14 s, đĸ 892 tokens, đĩ $0.0042 in tokens
trends
đ plan_correctness: 80.48% đ query_and_plan_alignment: 80.00% đ time_range_relevancy: 100.00%
Avg. case performance: âąī¸ 38.55 s, đĸ 10272 tokens, đĩ $0.0263 in tokens
Triggered by this commit.
đ¸ UI snapshots have been updated
40 snapshot changes in total. 0 added, 40 modified, 0 deleted:
chromium: 0 added, 40 modified, 0 deleted (diff for shard 1, diff for shard 5, diff for shard 8)webkit: 0 added, 0 modified, 0 deleted
Triggered by this commit.
đ¸ UI snapshots have been updated
4 snapshot changes in total. 0 added, 4 modified, 0 deleted:
chromium: 0 added, 4 modified, 0 deleted (diff for shard 5)webkit: 0 added, 0 modified, 0 deleted
Triggered by this commit.
đ¸ UI snapshots have been updated
4 snapshot changes in total. 0 added, 4 modified, 0 deleted:
chromium: 0 added, 4 modified, 0 deleted (diff for shard 5)webkit: 0 added, 0 modified, 0 deleted
Triggered by this commit.
đ§ AI eval results
Evaluated 6 experiments, comprising 16 metrics.
funnel
đ plan_correctness: 90.88% đ query_and_plan_alignment: 83.33% đ time_range_relevancy: 96.57%
Avg. case performance: âąī¸ 78.48 s, đĸ 6318 tokens, đĩ $0.0165 in tokens
memory
đ ToolRelevance: 97.50% đ memory_content_relevance: 90.00%
Avg. case performance: âąī¸ 6.96 s, đĸ 1211 tokens, đĩ $0.0033 in tokens
retention
đ plan_correctness: 58.21% đ query_and_plan_alignment: 52.78% đ time_range_relevancy: 95.83%
Avg. case performance: âąī¸ 26.15 s, đĸ 5369 tokens, đĩ $0.0161 in tokens
root
đ ToolRelevance: 77.07%
Avg. case performance: âąī¸ 7.81 s, đĸ 0 tokens
sql
đ plan_correctness: 48.61% đ query_and_plan_alignment: 72.00% đ sql_syntax_correctness: 86.36% đ time_range_relevancy: 93.18%
Avg. case performance: âąī¸ 34.52 s, đĸ 1044 tokens, đĩ $0.0050 in tokens
trends
đ plan_correctness: 73.81% đ query_and_plan_alignment: 78.95% đ time_range_relevancy: 100.00%
Avg. case performance: âąī¸ 40.40 s, đĸ 9424 tokens, đĩ $0.0267 in tokens
Triggered by this commit.
đ§ AI eval results
Evaluated 6 experiments, comprising 16 metrics.
funnel
đ plan_correctness: 90.98% đ query_and_plan_alignment: 83.67% đ time_range_relevancy: 97.00%
Avg. case performance: âąī¸ 72.45 s, đĸ 6325 tokens, đĩ $0.0169 in tokens
memory
đ ToolRelevance: 98.50% đ memory_content_relevance: 91.43%
Avg. case performance: âąī¸ 5.51 s, đĸ 1217 tokens, đĩ $0.0034 in tokens
retention
đ plan_correctness: 58.33% đ query_and_plan_alignment: 50.00% đ time_range_relevancy: 94.23%
Avg. case performance: âąī¸ 25.80 s, đĸ 5368 tokens, đĩ $0.0162 in tokens
root
đ ToolRelevance: 64.94%
Avg. case performance: âąī¸ 7.67 s, đĸ 0 tokens
sql
đ plan_correctness: 52.78% đ query_and_plan_alignment: 69.09% đ sql_syntax_correctness: 87.50% đ time_range_relevancy: 100.00%
Avg. case performance: âąī¸ 34.94 s, đĸ 1058 tokens, đĩ $0.0045 in tokens
trends
đ plan_correctness: 71.43% đ query_and_plan_alignment: 82.11% đ time_range_relevancy: 100.00%
Avg. case performance: âąī¸ 40.33 s, đĸ 9939 tokens, đĩ $0.0282 in tokens
Triggered by this commit.
đ§ AI eval results
Evaluated 6 experiments, comprising 16 metrics.
funnel
đ plan_correctness: 91.57% đ query_and_plan_alignment: 84.60% đ time_range_relevancy: 98.33%
Avg. case performance: âąī¸ 81.13 s, đĸ 6322 tokens, đĩ $0.0166 in tokens
memory
đ ToolRelevance: 97.80% đ memory_content_relevance: 91.43%
Avg. case performance: âąī¸ 5.22 s, đĸ 1218 tokens, đĩ $0.0034 in tokens
retention
đ plan_correctness: 73.00% đ query_and_plan_alignment: 53.13% đ time_range_relevancy: 95.00%
Avg. case performance: âąī¸ 25.39 s, đĸ 6200 tokens, đĩ $0.0162 in tokens
root
đ ToolRelevance: 77.05%
Avg. case performance: âąī¸ 5.55 s, đĸ 0 tokens
sql
đ plan_correctness: 48.61% đ query_and_plan_alignment: 71.67% đ sql_syntax_correctness: 86.36% đ time_range_relevancy: 97.73%
Avg. case performance: âąī¸ 35.53 s, đĸ 1341 tokens, đĩ $0.0059 in tokens
trends
đ plan_correctness: 76.19% đ query_and_plan_alignment: 82.00% đ time_range_relevancy: 100.00%
Avg. case performance: âąī¸ 33.15 s, đĸ 10438 tokens, đĩ $0.0281 in tokens
Triggered by this commit.
đ¸ UI snapshots have been updated
4 snapshot changes in total. 0 added, 4 modified, 0 deleted:
chromium: 0 added, 4 modified, 0 deleted (diff for shard 1)webkit: 0 added, 0 modified, 0 deleted
Triggered by this commit.
đ¸ UI snapshots have been updated
2 snapshot changes in total. 0 added, 2 modified, 0 deleted:
[!CAUTION]
Detected flapping snapshots
These snapshots have auto-updated more than once since the last human commit:
scenes-other-settings--settings-project--dark.png(chromium, shard 1)scenes-other-settings--settings-project--light.png(chromium, shard 1)The flippy-flappies are deadly and must be fixed ASAP. They're productivity killers. Run
pnpm storybooklocally and make the fix now. (Often, the cause isResizeObserverbeing used instead of the better CSS container queries.)
chromium: 0 added, 2 modified, 0 deleted (diff for shard 1)webkit: 0 added, 0 modified, 0 deleted
Triggered by this commit.
đ§ AI eval results
Evaluated 6 experiments, comprising 16 metrics.
funnel
đ plan_correctness: 93.63% đ query_and_plan_alignment: 83.83% đ time_range_relevancy: 95.88%
Avg. case performance: âąī¸ 68.12 s, đĸ 6195 tokens, đĩ $0.0162 in tokens
memory
đ ToolRelevance: 97.52% đ memory_content_relevance: 91.43%
Avg. case performance: âąī¸ 4.66 s, đĸ 1216 tokens, đĩ $0.0034 in tokens
retention
đ plan_correctness: 74.33% đ query_and_plan_alignment: 50.00% đ time_range_relevancy: 95.00%
Avg. case performance: âąī¸ 26.14 s, đĸ 6202 tokens, đĩ $0.0162 in tokens
root
đ ToolRelevance: 64.79%
Avg. case performance: âąī¸ 5.58 s, đĸ 0 tokens
sql
đ plan_correctness: 58.33% đ query_and_plan_alignment: 78.75% đ sql_syntax_correctness: 92.31% đ time_range_relevancy: 98.08%
Avg. case performance: âąī¸ 29.33 s, đĸ 1020 tokens, đĩ $0.0044 in tokens
trends
đ plan_correctness: 72.62% đ query_and_plan_alignment: 77.89% đ time_range_relevancy: 100.00%
Avg. case performance: âąī¸ 38.68 s, đĸ 9938 tokens, đĩ $0.0282 in tokens
Triggered by this commit.
đ¸ UI snapshots have been updated
3 snapshot changes in total. 0 added, 3 modified, 0 deleted:
chromium: 0 added, 3 modified, 0 deleted (diff for shard 1, diff for shard 7)webkit: 0 added, 0 modified, 0 deleted
Triggered by this commit.
đ§ AI eval results
Evaluated 6 experiments, comprising 16 metrics.
funnel
đ plan_correctness: 93.73% đ query_and_plan_alignment: 83.85% đ time_range_relevancy: 98.53%
Avg. case performance: âąī¸ 67.18 s, đĸ 6319 tokens, đĩ $0.0165 in tokens
memory
đ ToolRelevance: 98.36% đ memory_content_relevance: 90.00%
Avg. case performance: âąī¸ 5.19 s, đĸ 1209 tokens, đĩ $0.0033 in tokens
retention
đ plan_correctness: 75.00% đ query_and_plan_alignment: 54.55% đ time_range_relevancy: 95.00%
Avg. case performance: âąī¸ 31.28 s, đĸ 6217 tokens, đĩ $0.0163 in tokens
root
đ ToolRelevance: 70.97%
Avg. case performance: âąī¸ 7.83 s, đĸ 0 tokens
sql
đ plan_correctness: 42.22% đ query_and_plan_alignment: 78.50% đ sql_syntax_correctness: 90.00% đ time_range_relevancy: 82.50%
Avg. case performance: âąī¸ 31.73 s, đĸ 1263 tokens, đĩ $0.0061 in tokens
trends
đ plan_correctness: 82.14% đ query_and_plan_alignment: 80.00% đ time_range_relevancy: 100.00%
Avg. case performance: âąī¸ 33.15 s, đĸ 10284 tokens, đĩ $0.0264 in tokens
Triggered by this commit.
Size Change: +924 B (+0.04%)
Total Size: 2.58 MB
âšī¸ View Unchanged
| Filename | Size | Change |
|---|---|---|
frontend/dist/toolbar.js |
2.58 MB | +924 B (+0.04%) |
đ¸ UI snapshots have been updated
1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
[!CAUTION]
Detected flapping snapshots
These snapshots have auto-updated more than once since the last human commit:
scenes-app-surveys--survey-templates--dark.png(chromium, shard 7)The flippy-flappies are deadly and must be fixed ASAP. They're productivity killers. Run
pnpm storybooklocally and make the fix now. (Often, the cause isResizeObserverbeing used instead of the better CSS container queries.)
chromium: 0 added, 1 modified, 0 deleted (diff for shard 7)webkit: 0 added, 0 modified, 0 deleted
Triggered by this commit.
đ§ AI eval results
Evaluated 6 experiments, comprising 16 metrics.
funnel
đ plan_correctness: 91.37% đ query_and_plan_alignment: 85.70% đ time_range_relevancy: 96.08%
Avg. case performance: âąī¸ 70.99 s, đĸ 6326 tokens, đĩ $0.0166 in tokens
memory
đ ToolRelevance: 93.56% đ memory_content_relevance: 86.67%
Avg. case performance: âąī¸ 6.29 s, đĸ 1215 tokens, đĩ $0.0034 in tokens
retention
đ plan_correctness: 72.67% đ query_and_plan_alignment: 61.50% đ time_range_relevancy: 96.43%
Avg. case performance: âąī¸ 26.22 s, đĸ 5796 tokens, đĩ $0.0162 in tokens
root
đ ToolRelevance: 58.81%
Avg. case performance: âąī¸ 5.41 s, đĸ 0 tokens
sql
đ plan_correctness: 47.65% đ query_and_plan_alignment: 82.50% đ sql_syntax_correctness: 90.00% đ time_range_relevancy: 91.50%
Avg. case performance: âąī¸ 34.48 s, đĸ 980 tokens, đĩ $0.0052 in tokens
trends
đ plan_correctness: 81.67% đ query_and_plan_alignment: 80.00% đ time_range_relevancy: 100.00%
Avg. case performance: âąī¸ 36.45 s, đĸ 9855 tokens, đĩ $0.0266 in tokens
Triggered by this commit.
đ§ AI eval results
Evaluated 6 experiments, comprising 16 metrics.
funnel
đ plan_correctness: 91.08% đ query_and_plan_alignment: 87.90% đ time_range_relevancy: 97.55%
Avg. case performance: âąī¸ 78.92 s, đĸ 6621 tokens, đĩ $0.0173 in tokens
memory
đ ToolRelevance: 97.08% đ memory_content_relevance: 90.00%
Avg. case performance: âąī¸ 5.54 s, đĸ 1212 tokens, đĩ $0.0033 in tokens
retention
đ plan_correctness: 73.00% đ query_and_plan_alignment: 50.00% đ time_range_relevancy: 95.00%
Avg. case performance: âąī¸ 27.44 s, đĸ 6204 tokens, đĩ $0.0162 in tokens
root
đ ToolRelevance: 70.38%
Avg. case performance: âąī¸ 6.86 s, đĸ 0 tokens
sql
đ plan_correctness: 47.78% đ query_and_plan_alignment: 75.00% đ sql_syntax_correctness: 90.91% đ time_range_relevancy: 94.55%
Avg. case performance: âąī¸ 35.86 s, đĸ 903 tokens, đĩ $0.0043 in tokens
trends
đ plan_correctness: 84.05% đ query_and_plan_alignment: 80.71% đ time_range_relevancy: 100.00%
Avg. case performance: âąī¸ 39.18 s, đĸ 10277 tokens, đĩ $0.0264 in tokens
Triggered by this commit.
đ§ AI eval results
Evaluated 6 experiments, comprising 16 metrics.
funnel
đĸ plan_correctness: 93.92%, +3.73% versus baseline (master) (improvements: 3, regressions: 1) đĩ query_and_plan_alignment: 84.80%, -0.64% versus baseline (master) (improvements: 6, regressions: 6) đĸ time_range_relevancy: 98.82%, +1.27% versus baseline (master) (improvements: 2, regressions: 1)
Avg. case performance: âąī¸ 73.66 s, đĸ 6317 tokens, đĩ $0.0165 in tokens
memory
đĩ ToolRelevance: 97.91%, -0.30% versus baseline (master) (improvements: 1, regressions: 2) đĩ memory_content_relevance: 90.00%, Âą0.00% versus baseline (master) (improvements: 1, regressions: 1)
Avg. case performance: âąī¸ 5.61 s, đĸ 1216 tokens, đĩ $0.0034 in tokens
retention
đ´ plan_correctness: 63.67%, -10.67% versus baseline (master) (improvements: 0, regressions: 2) đ´ query_and_plan_alignment: 50.00%, -4.55% versus baseline (master) (improvements: 0, regressions: 1) đĸ time_range_relevancy: 98.08%, +3.08% versus baseline (master) (improvements: 0, regressions: 0)
Avg. case performance: âąī¸ 29.53 s, đĸ 5379 tokens, đĩ $0.0162 in tokens
root
đ´ ToolRelevance: 59.09%, -15.44% versus baseline (master) (improvements: 0, regressions: 1)
Avg. case performance: âąī¸ 7.54 s, đĸ 0 tokens
sql
đĸ plan_correctness: 65.56%, +11.39% versus baseline (master) (improvements: 3, regressions: 0) đĸ query_and_plan_alignment: 78.85%, +5.21% versus baseline (master) (improvements: 2, regressions: 1) đ´ sql_syntax_correctness: 89.29%, -6.55% versus baseline (master) (improvements: 0, regressions: 1) đ´ time_range_relevancy: 91.07%, -8.93% versus baseline (master) (improvements: 0, regressions: 1)
Avg. case performance: âąī¸ 38.61 s, đĸ 1033 tokens, đĩ $0.0043 in tokens
trends
đ´ plan_correctness: 75.00%, -3.10% versus baseline (master) (improvements: 1, regressions: 1) đĩ query_and_plan_alignment: 80.25%, -0.23% versus baseline (master) (improvements: 1, regressions: 2) đĩ time_range_relevancy: 100.00%, Âą0.00% versus baseline (master) (improvements: 0, regressions: 0)
Avg. case performance: âąī¸ 46.85 s, đĸ 10434 tokens, đĩ $0.0281 in tokens
Triggered by this commit.
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label â otherwise this will be closed in another week. If you want to permanentely keep it open, use the waiting label.
This issue has 2017 words at 17 comments. Issues this long are hard to read or contribute to, and tend to take very long to reach a conclusion. Instead, why not:
- Write some code and submit a pull request! Code wins arguments
- Have a sync meeting to reach a conclusion
- Create a Request for Comments and submit a PR with it to the meta repo or product internal repo
Is this issue intended to be sprawling? Consider adding label epic or sprint to indicate this.
đ¸ UI snapshots have been updated
36 snapshot changes in total. 0 added, 36 modified, 0 deleted:
chromium: 0 added, 36 modified, 0 deleted (diff for shard 1)webkit: 0 added, 0 modified, 0 deleted
Triggered by this commit.
đ¸ UI snapshots have been updated
1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
chromium: 0 added, 1 modified, 0 deleted (diff for shard 2)webkit: 0 added, 0 modified, 0 deleted
Triggered by this commit.
đ¸ UI snapshots have been updated
37 snapshot changes in total. 0 added, 37 modified, 0 deleted:
[!CAUTION]
Detected flapping snapshots
These snapshots have auto-updated more than once since the last human commit:
scenes-app-max-ai--empty-thread-loading--dark.png(chromium, shard 1)scenes-app-max-ai--empty-thread-loading--light.png(chromium, shard 1)scenes-app-max-ai--generation-failure-thread--dark.png(chromium, shard 1)scenes-app-max-ai--generation-failure-thread--light.png(chromium, shard 1)scenes-app-max-ai--thread--dark.png(chromium, shard 1)scenes-app-max-ai--thread--light.png(chromium, shard 1)scenes-app-max-ai--thread-scrolls-to-bottom-on-new-messages--dark.png(chromium, shard 1)scenes-app-max-ai--thread-scrolls-to-bottom-on-new-messages--light.png(chromium, shard 1)scenes-app-max-ai--thread-with-conversation-loading--dark.png(chromium, shard 1)scenes-app-max-ai--thread-with-conversation-loading--light.png(chromium, shard 1)scenes-app-max-ai--thread-with-empty-conversation--dark.png(chromium, shard 1)scenes-app-max-ai--thread-with-empty-conversation--light.png(chromium, shard 1)scenes-app-max-ai--thread-with-failed-generation--dark.png(chromium, shard 1)scenes-app-max-ai--thread-with-failed-generation--light.png(chromium, shard 1)scenes-app-max-ai--thread-with-form--dark.png(chromium, shard 1)scenes-app-max-ai--thread-with-form--light.png(chromium, shard 1)scenes-app-max-ai--thread-with-in-progress-conversation--dark.png(chromium, shard 1)scenes-app-max-ai--thread-with-in-progress-conversation--light.png(chromium, shard 1)scenes-app-max-ai--thread-with-multiple-context-objects--dark.png(chromium, shard 1)scenes-app-max-ai--thread-with-multiple-context-objects--light.png(chromium, shard 1)scenes-app-max-ai--thread-with-opened-suggestions--dark.png(chromium, shard 1)scenes-app-max-ai--thread-with-opened-suggestions--light.png(chromium, shard 1)scenes-app-max-ai--thread-with-opened-suggestions-mobile--dark.png(chromium, shard 1)scenes-app-max-ai--thread-with-opened-suggestions-mobile--light.png(chromium, shard 1)scenes-app-max-ai--thread-with-rate-limit--dark.png(chromium, shard 1)scenes-app-max-ai--thread-with-rate-limit--light.png(chromium, shard 1)scenes-app-max-ai--thread-with-rate-limit-no-retry-after--dark.png(chromium, shard 1)scenes-app-max-ai--thread-with-rate-limit-no-retry-after--light.png(chromium, shard 1)scenes-app-max-ai--welcome--dark.png(chromium, shard 1)scenes-app-max-ai--welcome--light.png(chromium, shard 1)scenes-app-max-ai--welcome-feature-preview-auto-enrolled--dark.png(chromium, shard 1)scenes-app-max-ai--welcome-feature-preview-auto-enrolled--light.png(chromium, shard 1)scenes-app-max-ai--welcome-with-latest-conversations--dark.png(chromium, shard 1)scenes-app-max-ai--welcome-with-latest-conversations--light.png(chromium, shard 1)scenes-app-sidepanels--side-panel-max--dark.png(chromium, shard 1)scenes-app-sidepanels--side-panel-max--light.png(chromium, shard 1)scenes-app-insights-funnels--funnel-top-to-bottom-edit--dark.png(chromium, shard 2)The flippy-flappies are deadly and must be fixed ASAP. They're productivity killers. Run
pnpm storybooklocally and make the fix now. (Often, the cause isResizeObserverbeing used instead of the better CSS container queries.)
chromium: 0 added, 37 modified, 0 deleted (diff for shard 1, diff for shard 2)webkit: 0 added, 0 modified, 0 deleted
Triggered by this commit.
đ§ AI eval results
Evaluated 6 experiments, comprising 19 metrics.
funnel
đ QueryKindSelection: 100.00% đ plan_correctness: 87.08% đ query_and_plan_alignment: 89.75% đ time_range_relevancy: 95.42%
Avg. case performance: âąī¸ 121.60 s, đĸ 6375 tokens, đĩ $0.0167 in tokens
memory
đ ToolRelevance: 98.78% đ memory_content_relevance: 92.86%
Avg. case performance: âąī¸ 5.76 s, đĸ 1217 tokens, đĩ $0.0034 in tokens
retention
đ QueryKindSelection: 100.00% đ plan_correctness: 55.00% đ query_and_plan_alignment: 53.33% đ time_range_relevancy: 93.75%
Avg. case performance: âąī¸ 38.32 s, đĸ 4934 tokens, đĩ $0.0161 in tokens
root
đ ToolRelevance: 58.76%
Avg. case performance: âąī¸ 5.77 s, đĸ 0 tokens
sql
đ QueryKindSelection: 0.00% đ plan_correctness: 75.00% đ query_and_plan_alignment: 50.00% đ time_range_relevancy: 100.00%
Avg. case performance: âąī¸ 17.61 s, đĸ 16274 tokens, đĩ $0.0417 in tokens
trends
đ QueryKindSelection: 100.00% đ plan_correctness: 83.33% đ query_and_plan_alignment: 87.75% đ time_range_relevancy: 97.00%
Avg. case performance: âąī¸ 66.80 s, đĸ 10809 tokens, đĩ $0.0277 in tokens
Triggered by this commit.
@Twixes implemented all your suggestions + added evals, ready for re-review!