posthog icon indicating copy to clipboard operation
posthog copied to clipboard

feat: events/actions context for Max

Open kappa90 opened this issue 7 months ago â€ĸ 23 comments

[!IMPORTANT] 👉 Stay up-to-date with PostHog coding conventions for a smoother review.

Problem

This PR adds a UI context to Max's root prompt, but only supports insights and dashboard. We want to extend it with actions and events.

Requires the original UI Context PR to be merged first.

Changes

Frontend:

  • Add events and actions support to maxContextLogic and related UI tags

Backend:

  • Support injected actions in the Rag node
  • Support injected events in the TaxonomyAgent node
  • Add events and actions context to the Root prompt

Did you write or update any docs for this change?

How did you test this code?

Wrote tests

kappa90 avatar Jun 03 '25 18:06 kappa90

🧠 AI eval results

Evaluated 6 experiments, comprising 16 metrics.

funnel

🆕 plan_correctness: 92.16% 🆕 query_and_plan_alignment: 84.15% 🆕 time_range_relevancy: 97.55%

Avg. case performance: âąī¸ 73.53 s, đŸ”ĸ 6192 tokens, đŸ’ĩ $0.0162 in tokens

memory

🆕 ToolRelevance: 98.61% 🆕 memory_content_relevance: 90.00%

Avg. case performance: âąī¸ 6.85 s, đŸ”ĸ 1216 tokens, đŸ’ĩ $0.0034 in tokens

retention

🆕 plan_correctness: 72.00% 🆕 query_and_plan_alignment: 60.71% 🆕 time_range_relevancy: 95.00%

Avg. case performance: âąī¸ 35.15 s, đŸ”ĸ 6213 tokens, đŸ’ĩ $0.0163 in tokens

root

🆕 ToolRelevance: 58.35%

Avg. case performance: âąī¸ 6.35 s, đŸ”ĸ 0 tokens

sql

🆕 plan_correctness: 48.06% 🆕 query_and_plan_alignment: 73.50% 🆕 sql_syntax_correctness: 95.45% 🆕 time_range_relevancy: 97.73%

Avg. case performance: âąī¸ 36.14 s, đŸ”ĸ 892 tokens, đŸ’ĩ $0.0042 in tokens

trends

🆕 plan_correctness: 80.48% 🆕 query_and_plan_alignment: 80.00% 🆕 time_range_relevancy: 100.00%

Avg. case performance: âąī¸ 38.55 s, đŸ”ĸ 10272 tokens, đŸ’ĩ $0.0263 in tokens

Triggered by this commit.

posthog-bot avatar Jun 03 '25 18:06 posthog-bot

📸 UI snapshots have been updated

40 snapshot changes in total. 0 added, 40 modified, 0 deleted:

Triggered by this commit.

👉 Review this PR's diff of snapshots.

posthog-bot avatar Jun 03 '25 18:06 posthog-bot

📸 UI snapshots have been updated

4 snapshot changes in total. 0 added, 4 modified, 0 deleted:

  • chromium: 0 added, 4 modified, 0 deleted (diff for shard 5)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

posthog-bot avatar Jun 03 '25 19:06 posthog-bot

📸 UI snapshots have been updated

4 snapshot changes in total. 0 added, 4 modified, 0 deleted:

  • chromium: 0 added, 4 modified, 0 deleted (diff for shard 5)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

posthog-bot avatar Jun 03 '25 19:06 posthog-bot

🧠 AI eval results

Evaluated 6 experiments, comprising 16 metrics.

funnel

🆕 plan_correctness: 90.88% 🆕 query_and_plan_alignment: 83.33% 🆕 time_range_relevancy: 96.57%

Avg. case performance: âąī¸ 78.48 s, đŸ”ĸ 6318 tokens, đŸ’ĩ $0.0165 in tokens

memory

🆕 ToolRelevance: 97.50% 🆕 memory_content_relevance: 90.00%

Avg. case performance: âąī¸ 6.96 s, đŸ”ĸ 1211 tokens, đŸ’ĩ $0.0033 in tokens

retention

🆕 plan_correctness: 58.21% 🆕 query_and_plan_alignment: 52.78% 🆕 time_range_relevancy: 95.83%

Avg. case performance: âąī¸ 26.15 s, đŸ”ĸ 5369 tokens, đŸ’ĩ $0.0161 in tokens

root

🆕 ToolRelevance: 77.07%

Avg. case performance: âąī¸ 7.81 s, đŸ”ĸ 0 tokens

sql

🆕 plan_correctness: 48.61% 🆕 query_and_plan_alignment: 72.00% 🆕 sql_syntax_correctness: 86.36% 🆕 time_range_relevancy: 93.18%

Avg. case performance: âąī¸ 34.52 s, đŸ”ĸ 1044 tokens, đŸ’ĩ $0.0050 in tokens

trends

🆕 plan_correctness: 73.81% 🆕 query_and_plan_alignment: 78.95% 🆕 time_range_relevancy: 100.00%

Avg. case performance: âąī¸ 40.40 s, đŸ”ĸ 9424 tokens, đŸ’ĩ $0.0267 in tokens

Triggered by this commit.

posthog-bot avatar Jun 03 '25 19:06 posthog-bot

🧠 AI eval results

Evaluated 6 experiments, comprising 16 metrics.

funnel

🆕 plan_correctness: 90.98% 🆕 query_and_plan_alignment: 83.67% 🆕 time_range_relevancy: 97.00%

Avg. case performance: âąī¸ 72.45 s, đŸ”ĸ 6325 tokens, đŸ’ĩ $0.0169 in tokens

memory

🆕 ToolRelevance: 98.50% 🆕 memory_content_relevance: 91.43%

Avg. case performance: âąī¸ 5.51 s, đŸ”ĸ 1217 tokens, đŸ’ĩ $0.0034 in tokens

retention

🆕 plan_correctness: 58.33% 🆕 query_and_plan_alignment: 50.00% 🆕 time_range_relevancy: 94.23%

Avg. case performance: âąī¸ 25.80 s, đŸ”ĸ 5368 tokens, đŸ’ĩ $0.0162 in tokens

root

🆕 ToolRelevance: 64.94%

Avg. case performance: âąī¸ 7.67 s, đŸ”ĸ 0 tokens

sql

🆕 plan_correctness: 52.78% 🆕 query_and_plan_alignment: 69.09% 🆕 sql_syntax_correctness: 87.50% 🆕 time_range_relevancy: 100.00%

Avg. case performance: âąī¸ 34.94 s, đŸ”ĸ 1058 tokens, đŸ’ĩ $0.0045 in tokens

trends

🆕 plan_correctness: 71.43% 🆕 query_and_plan_alignment: 82.11% 🆕 time_range_relevancy: 100.00%

Avg. case performance: âąī¸ 40.33 s, đŸ”ĸ 9939 tokens, đŸ’ĩ $0.0282 in tokens

Triggered by this commit.

posthog-bot avatar Jun 04 '25 13:06 posthog-bot

🧠 AI eval results

Evaluated 6 experiments, comprising 16 metrics.

funnel

🆕 plan_correctness: 91.57% 🆕 query_and_plan_alignment: 84.60% 🆕 time_range_relevancy: 98.33%

Avg. case performance: âąī¸ 81.13 s, đŸ”ĸ 6322 tokens, đŸ’ĩ $0.0166 in tokens

memory

🆕 ToolRelevance: 97.80% 🆕 memory_content_relevance: 91.43%

Avg. case performance: âąī¸ 5.22 s, đŸ”ĸ 1218 tokens, đŸ’ĩ $0.0034 in tokens

retention

🆕 plan_correctness: 73.00% 🆕 query_and_plan_alignment: 53.13% 🆕 time_range_relevancy: 95.00%

Avg. case performance: âąī¸ 25.39 s, đŸ”ĸ 6200 tokens, đŸ’ĩ $0.0162 in tokens

root

🆕 ToolRelevance: 77.05%

Avg. case performance: âąī¸ 5.55 s, đŸ”ĸ 0 tokens

sql

🆕 plan_correctness: 48.61% 🆕 query_and_plan_alignment: 71.67% 🆕 sql_syntax_correctness: 86.36% 🆕 time_range_relevancy: 97.73%

Avg. case performance: âąī¸ 35.53 s, đŸ”ĸ 1341 tokens, đŸ’ĩ $0.0059 in tokens

trends

🆕 plan_correctness: 76.19% 🆕 query_and_plan_alignment: 82.00% 🆕 time_range_relevancy: 100.00%

Avg. case performance: âąī¸ 33.15 s, đŸ”ĸ 10438 tokens, đŸ’ĩ $0.0281 in tokens

Triggered by this commit.

posthog-bot avatar Jun 04 '25 13:06 posthog-bot

📸 UI snapshots have been updated

4 snapshot changes in total. 0 added, 4 modified, 0 deleted:

  • chromium: 0 added, 4 modified, 0 deleted (diff for shard 1)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

posthog-bot avatar Jun 04 '25 13:06 posthog-bot

📸 UI snapshots have been updated

2 snapshot changes in total. 0 added, 2 modified, 0 deleted:

[!CAUTION]

Detected flapping snapshots

These snapshots have auto-updated more than once since the last human commit:

  • scenes-other-settings--settings-project--dark.png (chromium, shard 1)
  • scenes-other-settings--settings-project--light.png (chromium, shard 1)

The flippy-flappies are deadly and must be fixed ASAP. They're productivity killers. Run pnpm storybook locally and make the fix now. (Often, the cause is ResizeObserver being used instead of the better CSS container queries.)

  • chromium: 0 added, 2 modified, 0 deleted (diff for shard 1)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

posthog-bot avatar Jun 04 '25 13:06 posthog-bot

🧠 AI eval results

Evaluated 6 experiments, comprising 16 metrics.

funnel

🆕 plan_correctness: 93.63% 🆕 query_and_plan_alignment: 83.83% 🆕 time_range_relevancy: 95.88%

Avg. case performance: âąī¸ 68.12 s, đŸ”ĸ 6195 tokens, đŸ’ĩ $0.0162 in tokens

memory

🆕 ToolRelevance: 97.52% 🆕 memory_content_relevance: 91.43%

Avg. case performance: âąī¸ 4.66 s, đŸ”ĸ 1216 tokens, đŸ’ĩ $0.0034 in tokens

retention

🆕 plan_correctness: 74.33% 🆕 query_and_plan_alignment: 50.00% 🆕 time_range_relevancy: 95.00%

Avg. case performance: âąī¸ 26.14 s, đŸ”ĸ 6202 tokens, đŸ’ĩ $0.0162 in tokens

root

🆕 ToolRelevance: 64.79%

Avg. case performance: âąī¸ 5.58 s, đŸ”ĸ 0 tokens

sql

🆕 plan_correctness: 58.33% 🆕 query_and_plan_alignment: 78.75% 🆕 sql_syntax_correctness: 92.31% 🆕 time_range_relevancy: 98.08%

Avg. case performance: âąī¸ 29.33 s, đŸ”ĸ 1020 tokens, đŸ’ĩ $0.0044 in tokens

trends

🆕 plan_correctness: 72.62% 🆕 query_and_plan_alignment: 77.89% 🆕 time_range_relevancy: 100.00%

Avg. case performance: âąī¸ 38.68 s, đŸ”ĸ 9938 tokens, đŸ’ĩ $0.0282 in tokens

Triggered by this commit.

posthog-bot avatar Jun 04 '25 13:06 posthog-bot

📸 UI snapshots have been updated

3 snapshot changes in total. 0 added, 3 modified, 0 deleted:

Triggered by this commit.

👉 Review this PR's diff of snapshots.

posthog-bot avatar Jun 04 '25 15:06 posthog-bot

🧠 AI eval results

Evaluated 6 experiments, comprising 16 metrics.

funnel

🆕 plan_correctness: 93.73% 🆕 query_and_plan_alignment: 83.85% 🆕 time_range_relevancy: 98.53%

Avg. case performance: âąī¸ 67.18 s, đŸ”ĸ 6319 tokens, đŸ’ĩ $0.0165 in tokens

memory

🆕 ToolRelevance: 98.36% 🆕 memory_content_relevance: 90.00%

Avg. case performance: âąī¸ 5.19 s, đŸ”ĸ 1209 tokens, đŸ’ĩ $0.0033 in tokens

retention

🆕 plan_correctness: 75.00% 🆕 query_and_plan_alignment: 54.55% 🆕 time_range_relevancy: 95.00%

Avg. case performance: âąī¸ 31.28 s, đŸ”ĸ 6217 tokens, đŸ’ĩ $0.0163 in tokens

root

🆕 ToolRelevance: 70.97%

Avg. case performance: âąī¸ 7.83 s, đŸ”ĸ 0 tokens

sql

🆕 plan_correctness: 42.22% 🆕 query_and_plan_alignment: 78.50% 🆕 sql_syntax_correctness: 90.00% 🆕 time_range_relevancy: 82.50%

Avg. case performance: âąī¸ 31.73 s, đŸ”ĸ 1263 tokens, đŸ’ĩ $0.0061 in tokens

trends

🆕 plan_correctness: 82.14% 🆕 query_and_plan_alignment: 80.00% 🆕 time_range_relevancy: 100.00%

Avg. case performance: âąī¸ 33.15 s, đŸ”ĸ 10284 tokens, đŸ’ĩ $0.0264 in tokens

Triggered by this commit.

posthog-bot avatar Jun 04 '25 15:06 posthog-bot

Size Change: +924 B (+0.04%)

Total Size: 2.58 MB

â„šī¸ View Unchanged
Filename Size Change
frontend/dist/toolbar.js 2.58 MB +924 B (+0.04%)

compressed-size-action

github-actions[bot] avatar Jun 04 '25 15:06 github-actions[bot]

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

[!CAUTION]

Detected flapping snapshots

These snapshots have auto-updated more than once since the last human commit:

  • scenes-app-surveys--survey-templates--dark.png (chromium, shard 7)

The flippy-flappies are deadly and must be fixed ASAP. They're productivity killers. Run pnpm storybook locally and make the fix now. (Often, the cause is ResizeObserver being used instead of the better CSS container queries.)

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 7)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

posthog-bot avatar Jun 04 '25 15:06 posthog-bot

🧠 AI eval results

Evaluated 6 experiments, comprising 16 metrics.

funnel

🆕 plan_correctness: 91.37% 🆕 query_and_plan_alignment: 85.70% 🆕 time_range_relevancy: 96.08%

Avg. case performance: âąī¸ 70.99 s, đŸ”ĸ 6326 tokens, đŸ’ĩ $0.0166 in tokens

memory

🆕 ToolRelevance: 93.56% 🆕 memory_content_relevance: 86.67%

Avg. case performance: âąī¸ 6.29 s, đŸ”ĸ 1215 tokens, đŸ’ĩ $0.0034 in tokens

retention

🆕 plan_correctness: 72.67% 🆕 query_and_plan_alignment: 61.50% 🆕 time_range_relevancy: 96.43%

Avg. case performance: âąī¸ 26.22 s, đŸ”ĸ 5796 tokens, đŸ’ĩ $0.0162 in tokens

root

🆕 ToolRelevance: 58.81%

Avg. case performance: âąī¸ 5.41 s, đŸ”ĸ 0 tokens

sql

🆕 plan_correctness: 47.65% 🆕 query_and_plan_alignment: 82.50% 🆕 sql_syntax_correctness: 90.00% 🆕 time_range_relevancy: 91.50%

Avg. case performance: âąī¸ 34.48 s, đŸ”ĸ 980 tokens, đŸ’ĩ $0.0052 in tokens

trends

🆕 plan_correctness: 81.67% 🆕 query_and_plan_alignment: 80.00% 🆕 time_range_relevancy: 100.00%

Avg. case performance: âąī¸ 36.45 s, đŸ”ĸ 9855 tokens, đŸ’ĩ $0.0266 in tokens

Triggered by this commit.

posthog-bot avatar Jun 04 '25 15:06 posthog-bot

🧠 AI eval results

Evaluated 6 experiments, comprising 16 metrics.

funnel

🆕 plan_correctness: 91.08% 🆕 query_and_plan_alignment: 87.90% 🆕 time_range_relevancy: 97.55%

Avg. case performance: âąī¸ 78.92 s, đŸ”ĸ 6621 tokens, đŸ’ĩ $0.0173 in tokens

memory

🆕 ToolRelevance: 97.08% 🆕 memory_content_relevance: 90.00%

Avg. case performance: âąī¸ 5.54 s, đŸ”ĸ 1212 tokens, đŸ’ĩ $0.0033 in tokens

retention

🆕 plan_correctness: 73.00% 🆕 query_and_plan_alignment: 50.00% 🆕 time_range_relevancy: 95.00%

Avg. case performance: âąī¸ 27.44 s, đŸ”ĸ 6204 tokens, đŸ’ĩ $0.0162 in tokens

root

🆕 ToolRelevance: 70.38%

Avg. case performance: âąī¸ 6.86 s, đŸ”ĸ 0 tokens

sql

🆕 plan_correctness: 47.78% 🆕 query_and_plan_alignment: 75.00% 🆕 sql_syntax_correctness: 90.91% 🆕 time_range_relevancy: 94.55%

Avg. case performance: âąī¸ 35.86 s, đŸ”ĸ 903 tokens, đŸ’ĩ $0.0043 in tokens

trends

🆕 plan_correctness: 84.05% 🆕 query_and_plan_alignment: 80.71% 🆕 time_range_relevancy: 100.00%

Avg. case performance: âąī¸ 39.18 s, đŸ”ĸ 10277 tokens, đŸ’ĩ $0.0264 in tokens

Triggered by this commit.

posthog-bot avatar Jun 04 '25 17:06 posthog-bot

🧠 AI eval results

Evaluated 6 experiments, comprising 16 metrics.

funnel

đŸŸĸ plan_correctness: 93.92%, +3.73% versus baseline (master) (improvements: 3, regressions: 1) đŸ”ĩ query_and_plan_alignment: 84.80%, -0.64% versus baseline (master) (improvements: 6, regressions: 6) đŸŸĸ time_range_relevancy: 98.82%, +1.27% versus baseline (master) (improvements: 2, regressions: 1)

Avg. case performance: âąī¸ 73.66 s, đŸ”ĸ 6317 tokens, đŸ’ĩ $0.0165 in tokens

memory

đŸ”ĩ ToolRelevance: 97.91%, -0.30% versus baseline (master) (improvements: 1, regressions: 2) đŸ”ĩ memory_content_relevance: 90.00%, Âą0.00% versus baseline (master) (improvements: 1, regressions: 1)

Avg. case performance: âąī¸ 5.61 s, đŸ”ĸ 1216 tokens, đŸ’ĩ $0.0034 in tokens

retention

🔴 plan_correctness: 63.67%, -10.67% versus baseline (master) (improvements: 0, regressions: 2) 🔴 query_and_plan_alignment: 50.00%, -4.55% versus baseline (master) (improvements: 0, regressions: 1) đŸŸĸ time_range_relevancy: 98.08%, +3.08% versus baseline (master) (improvements: 0, regressions: 0)

Avg. case performance: âąī¸ 29.53 s, đŸ”ĸ 5379 tokens, đŸ’ĩ $0.0162 in tokens

root

🔴 ToolRelevance: 59.09%, -15.44% versus baseline (master) (improvements: 0, regressions: 1)

Avg. case performance: âąī¸ 7.54 s, đŸ”ĸ 0 tokens

sql

đŸŸĸ plan_correctness: 65.56%, +11.39% versus baseline (master) (improvements: 3, regressions: 0) đŸŸĸ query_and_plan_alignment: 78.85%, +5.21% versus baseline (master) (improvements: 2, regressions: 1) 🔴 sql_syntax_correctness: 89.29%, -6.55% versus baseline (master) (improvements: 0, regressions: 1) 🔴 time_range_relevancy: 91.07%, -8.93% versus baseline (master) (improvements: 0, regressions: 1)

Avg. case performance: âąī¸ 38.61 s, đŸ”ĸ 1033 tokens, đŸ’ĩ $0.0043 in tokens

trends

🔴 plan_correctness: 75.00%, -3.10% versus baseline (master) (improvements: 1, regressions: 1) đŸ”ĩ query_and_plan_alignment: 80.25%, -0.23% versus baseline (master) (improvements: 1, regressions: 2) đŸ”ĩ time_range_relevancy: 100.00%, Âą0.00% versus baseline (master) (improvements: 0, regressions: 0)

Avg. case performance: âąī¸ 46.85 s, đŸ”ĸ 10434 tokens, đŸ’ĩ $0.0281 in tokens

Triggered by this commit.

posthog-bot avatar Jun 05 '25 15:06 posthog-bot

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week. If you want to permanentely keep it open, use the waiting label.

posthog-bot avatar Jun 13 '25 07:06 posthog-bot

This issue has 2017 words at 17 comments. Issues this long are hard to read or contribute to, and tend to take very long to reach a conclusion. Instead, why not:

  1. Write some code and submit a pull request! Code wins arguments
  2. Have a sync meeting to reach a conclusion
  3. Create a Request for Comments and submit a PR with it to the meta repo or product internal repo

Is this issue intended to be sprawling? Consider adding label epic or sprint to indicate this.

📸 UI snapshots have been updated

36 snapshot changes in total. 0 added, 36 modified, 0 deleted:

  • chromium: 0 added, 36 modified, 0 deleted (diff for shard 1)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

posthog-bot avatar Jun 13 '25 13:06 posthog-bot

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

posthog-bot avatar Jun 13 '25 14:06 posthog-bot

📸 UI snapshots have been updated

37 snapshot changes in total. 0 added, 37 modified, 0 deleted:

[!CAUTION]

Detected flapping snapshots

These snapshots have auto-updated more than once since the last human commit:

  • scenes-app-max-ai--empty-thread-loading--dark.png (chromium, shard 1)
  • scenes-app-max-ai--empty-thread-loading--light.png (chromium, shard 1)
  • scenes-app-max-ai--generation-failure-thread--dark.png (chromium, shard 1)
  • scenes-app-max-ai--generation-failure-thread--light.png (chromium, shard 1)
  • scenes-app-max-ai--thread--dark.png (chromium, shard 1)
  • scenes-app-max-ai--thread--light.png (chromium, shard 1)
  • scenes-app-max-ai--thread-scrolls-to-bottom-on-new-messages--dark.png (chromium, shard 1)
  • scenes-app-max-ai--thread-scrolls-to-bottom-on-new-messages--light.png (chromium, shard 1)
  • scenes-app-max-ai--thread-with-conversation-loading--dark.png (chromium, shard 1)
  • scenes-app-max-ai--thread-with-conversation-loading--light.png (chromium, shard 1)
  • scenes-app-max-ai--thread-with-empty-conversation--dark.png (chromium, shard 1)
  • scenes-app-max-ai--thread-with-empty-conversation--light.png (chromium, shard 1)
  • scenes-app-max-ai--thread-with-failed-generation--dark.png (chromium, shard 1)
  • scenes-app-max-ai--thread-with-failed-generation--light.png (chromium, shard 1)
  • scenes-app-max-ai--thread-with-form--dark.png (chromium, shard 1)
  • scenes-app-max-ai--thread-with-form--light.png (chromium, shard 1)
  • scenes-app-max-ai--thread-with-in-progress-conversation--dark.png (chromium, shard 1)
  • scenes-app-max-ai--thread-with-in-progress-conversation--light.png (chromium, shard 1)
  • scenes-app-max-ai--thread-with-multiple-context-objects--dark.png (chromium, shard 1)
  • scenes-app-max-ai--thread-with-multiple-context-objects--light.png (chromium, shard 1)
  • scenes-app-max-ai--thread-with-opened-suggestions--dark.png (chromium, shard 1)
  • scenes-app-max-ai--thread-with-opened-suggestions--light.png (chromium, shard 1)
  • scenes-app-max-ai--thread-with-opened-suggestions-mobile--dark.png (chromium, shard 1)
  • scenes-app-max-ai--thread-with-opened-suggestions-mobile--light.png (chromium, shard 1)
  • scenes-app-max-ai--thread-with-rate-limit--dark.png (chromium, shard 1)
  • scenes-app-max-ai--thread-with-rate-limit--light.png (chromium, shard 1)
  • scenes-app-max-ai--thread-with-rate-limit-no-retry-after--dark.png (chromium, shard 1)
  • scenes-app-max-ai--thread-with-rate-limit-no-retry-after--light.png (chromium, shard 1)
  • scenes-app-max-ai--welcome--dark.png (chromium, shard 1)
  • scenes-app-max-ai--welcome--light.png (chromium, shard 1)
  • scenes-app-max-ai--welcome-feature-preview-auto-enrolled--dark.png (chromium, shard 1)
  • scenes-app-max-ai--welcome-feature-preview-auto-enrolled--light.png (chromium, shard 1)
  • scenes-app-max-ai--welcome-with-latest-conversations--dark.png (chromium, shard 1)
  • scenes-app-max-ai--welcome-with-latest-conversations--light.png (chromium, shard 1)
  • scenes-app-sidepanels--side-panel-max--dark.png (chromium, shard 1)
  • scenes-app-sidepanels--side-panel-max--light.png (chromium, shard 1)
  • scenes-app-insights-funnels--funnel-top-to-bottom-edit--dark.png (chromium, shard 2)

The flippy-flappies are deadly and must be fixed ASAP. They're productivity killers. Run pnpm storybook locally and make the fix now. (Often, the cause is ResizeObserver being used instead of the better CSS container queries.)

Triggered by this commit.

👉 Review this PR's diff of snapshots.

posthog-bot avatar Jun 13 '25 14:06 posthog-bot

🧠 AI eval results

Evaluated 6 experiments, comprising 19 metrics.

funnel

🆕 QueryKindSelection: 100.00% 🆕 plan_correctness: 87.08% 🆕 query_and_plan_alignment: 89.75% 🆕 time_range_relevancy: 95.42%

Avg. case performance: âąī¸ 121.60 s, đŸ”ĸ 6375 tokens, đŸ’ĩ $0.0167 in tokens

memory

🆕 ToolRelevance: 98.78% 🆕 memory_content_relevance: 92.86%

Avg. case performance: âąī¸ 5.76 s, đŸ”ĸ 1217 tokens, đŸ’ĩ $0.0034 in tokens

retention

🆕 QueryKindSelection: 100.00% 🆕 plan_correctness: 55.00% 🆕 query_and_plan_alignment: 53.33% 🆕 time_range_relevancy: 93.75%

Avg. case performance: âąī¸ 38.32 s, đŸ”ĸ 4934 tokens, đŸ’ĩ $0.0161 in tokens

root

🆕 ToolRelevance: 58.76%

Avg. case performance: âąī¸ 5.77 s, đŸ”ĸ 0 tokens

sql

🆕 QueryKindSelection: 0.00% 🆕 plan_correctness: 75.00% 🆕 query_and_plan_alignment: 50.00% 🆕 time_range_relevancy: 100.00%

Avg. case performance: âąī¸ 17.61 s, đŸ”ĸ 16274 tokens, đŸ’ĩ $0.0417 in tokens

trends

🆕 QueryKindSelection: 100.00% 🆕 plan_correctness: 83.33% 🆕 query_and_plan_alignment: 87.75% 🆕 time_range_relevancy: 97.00%

Avg. case performance: âąī¸ 66.80 s, đŸ”ĸ 10809 tokens, đŸ’ĩ $0.0277 in tokens

Triggered by this commit.

posthog-bot avatar Jun 13 '25 20:06 posthog-bot

@Twixes implemented all your suggestions + added evals, ready for re-review!

kappa90 avatar Jun 23 '25 12:06 kappa90