Monorepo icon indicating copy to clipboard operation
Monorepo copied to clipboard

LLM Data Import

Open SachaG opened this issue 2 months ago • 6 comments

Some notes about importing normalization data from LLMs (cc @LeaVerou and @DmitrySharabin).

ID Specificity

We should avoid broad ids such as desktop or images because they are quite vague once taken out of the context of pain points. Instead, desktop_issues and images_issues make it clear that these are pain points.

ID Collision

Overly broad ids also run the risk of colliding with existing entities/tokens (https://github.com/Devographics/entities). For example firefox is already assigned to the Firefox browser itself. So when referring to e.g. "issues with Firefox", firefox_issues would be better.

ID Continuity

There is also a backwards-compatibility aspect where if for example we've used interoperability_issues up to now, it'd be better to avoid switching to e.g. interop when referring to the same concept.

Filtering Codes/Tokens

To make dealing with these issues easier we can probably leave out any code with less than 10 matching items from the YAML exports, since these won't show up in the survey results anyway.

~~Common vs Question-Specific Codes/Tokens~~

EDIT: unified file is probbaly better after all

We don't want to repeat the same codes/tokens across e.g. forms_pain_points.yml, graphics_pain_points.yml, etc. So it'd be good to factor our any common codes into their own common_pain_points.yml file.

Note: this can be solved by using a single file for all pain points for a survey, but the downside is that this removes the ability to define that a set of pain points should only apply to question Foo, while another set only applies to question Bar, and so on.

Generic Codes

It might be better to leave out generic codes such as limitations or tooling altogether since they might make the resulting charts less specific.

ID Formatting

Maybe it's a one-off thing but I noticed an id named end-user_ux. It'd be good to avoid - and only use _ in IDs to be safe.

SachaG avatar Oct 26 '25 00:10 SachaG

Test Case: Performance Pain Points

  1. I commented out the matchTags array in the questions outline and also renamed performance_pain_points.yml from the entities repo to performance_pain_points_disabled.yml to make sure no regex at all are applied to the question's dataset.
  2. I then clicked "Normalize All" on the question dashboard to trigger the normalization process and wipe out any existing code assignments.
  3. I pasted in the contents of the JSON export into the Import dialog.
  4. This found 115 new tokens
Image
  1. At this point the assignments exist as custom normalizations in the database but are not yet part of the final dataset. So I run "Normalize All" again.
  2. We're now at 98% normalized! Thanks ChatGPT! (and LeaGPT ;)
Image
  1. We can get a preview of the resulting data by clicking "Results":
Image
  1. And we can also now build the results site to see the chart:
Image

Next Steps

  • Although we have a disallowedTokenIds property to exclude codes/tokens from a specific question, it currently only applies to regexes. So we might need another approach to "turn off" the performance token here.
  • Figure out nesting

SachaG avatar Oct 28 '25 01:10 SachaG

Typo:

  • envirnoments_consumers

cc @DmitrySharabin)

SachaG avatar Oct 28 '25 02:10 SachaG

Test Case Part II: Nesting

This is what I get by using the YAML file as is. It'll need tweaking but it's working at least!

Image

SachaG avatar Oct 28 '25 05:10 SachaG

Typo:

  • envirnoments_consumers

Fixed:

Image Image

DmitrySharabin avatar Oct 28 '25 07:10 DmitrySharabin

Issue

  • Tokens may sometimes logically belong to more than one parent.

Example

I hate that quantum values in animations don't work in IE 6!

  • Matches interop, animations, and quantum_values
  • If we only nest quantum_values under animations and not interop, then interop bucket cannot be drilled down into, which leads to a boring chart.

Proposed Solution

  1. do not assign "combination tokens", such as interop_anchor_positioning
  2. add support for multiple parentIds instead of a single one
  3. when nesting, nest under multiple parents – but only if the item has the parent's token
  4. for features, define an API for bulk parentIds assignment based on the existing tags
  5. find an elegant way to show that a token can have multiple ancestor hierarchies in the breadcrumbs view

Update: Why it wouldn't work

The above solution presupposes that we know when token A appears along token B. This is true when looking at the raw, answer-by-answer data, but this is not what the chart is built on. Instead, the chart uses a compiled form of the data with total counts for each tokens:

- id: interop
  count: 122
- id: animations
  count: 89
- id: anchor_positioning
  count: 22
- id: quantum_values
  count: 13
...

We then separately define that quantum_values is nested under animations. But we can't know how many of those 13 quantum_values items also happen to have the interop token.

New Solution

We resign ourselves to "boring" buckets that can't be drilled down into in the main chart, but add additional filtered views to complement it:

Image

SachaG avatar Nov 05 '25 07:11 SachaG

Here's some observations about all the charts we have currently.

Forms Pain Points

Image
  • Limitations at # 1 is a dead end since it can't be drilled down into
  • DX at # 2 can be drilled down into, but DX by itself has 555 answers while its children only total 118 answers combined. This means that either
    • A) there are a lot of DX items that don't get assigned a finer-grained child token.
    • B) DX itself gets assigned too easily to non-DX items (which I think is the case here, looking at the detail of the answers)
  • Because end-user_experience uses a -, that triggers a GraphQL bug that makes it impossible to view its raw answers.
  • quality (low quality results) seems vague and unrelated to rendering
  • I think framework_compatibility could be moved up one level to live at the root without being a child of ecosystem.
  • "Specific platforms & vendors" seems like it should maybe be a child of interop?

Graphics & Multimedia Pain Points

Image
  • Again big delta between total DX items and sum of its child buckets.
  • canvas is an example where a feature id in the provided YAML doesn't match the feature's existing id, which should be canvas_element. This is why the label is not auto-linked to the MDN page, CanIUse, etc.
  • performance, video, animations are all good, easy-to-understand top-level categories.
  • Not sure why we have safari by itself and not under specific_platforms_vendors or interop.
  • Again quality seems out of place here.

Content Pain Points

Image
  • Same issues as before with limitations and dx

Interactivity Pain Points

Image
  • limitations/dx issues
  • Should we group CSS features together or leave them standalone at the root level? We're not grouping non-CSS features together, so maybe we shouldn't group CSS features either for consistency?
  • Should xslt be nested under CSS?

Performance Pain Points

Image

Web Components Other Pain Points

Image

System Capabilities Pain Points

Image
  • Again, Google/Mozilla/Firefox should be nested under "specific platform & vendors", or maybe interop.

Overall Conclusions

  • Get rid of limitations
  • create a new more restrictive dx (named cognitive_load?), so that Σ(dx) is closer to Σ(children of dx).
  • Rename dx to something more descriptive of what it actually is matching
  • Get rid of ecosystem and move its children to the root level
  • Make sure all ids match previously-defined ids (e.g. canvas vs canvas_element)
  • Nest things under interop (specific browsers, platforms, etc.)
  • Maybe: rename interop to browser_support for consistency with previous years?
  • Get rid of quality ("low quality results")
  • Double-check all ids for disallowed characters (-)

SachaG avatar Nov 07 '25 00:11 SachaG

Double-check Apple/Safari nesting

Image

SachaG avatar Nov 17 '25 03:11 SachaG

  • interop issues -> browser support?
  • styling vs styling & customization in Forms chart
  • have button to nest/un-nest items?

Tabs

  • default tab
  • limitations
  • browser_support
  • styling?

Note: do not show limitations/browser_support in secondary tabs

Recap

  • hide limitations from main view (dead end)
  • hide interop from main view (maybe?)
  • get rid of dx in general
  • create new cognitive_overload category that is the parent of the current dx children

SachaG avatar Nov 17 '25 07:11 SachaG

Notes from today's meeting:

Reduce number of codes:

  • Flatten certain codes (e.g. Apple → Safari)
  • Fold platforms & vendors under Interop
  • Add cognitive_overload as a child of dx, and make all of dx's children be children of cognitive_overload except id_management, which should be in the root (or is there a better parent?)
  • Drop CSS as a top-level category
  • ...?

UI:

  • Move circles with +N to the right for better proximity
  • Hide items with <10 answers
  • Hide Limitations from Interop view and vice versa.
  • Do we want a Styling view?

Blockers:

  • Figure out what to do with overly general bars
  • Eliminate the huge "unsorted" bars ("View more"?)
  • We need to have a flat view, otherwise you can't see the rank of an item

LeaVerou avatar Nov 17 '25 07:11 LeaVerou

To-do recap for next round of exports:

  • [ ] create new cognitive_overload category that is the parent of the current dx children but does not belong to dx
  • [ ] also add to it the following codes: speed_of_change, too_many_choices, reinventing_the_wheel, version_updates, excessive_complexity, hard_to_memorize, excessive_verbosity if not already present. Note: it's fine if we can't re-run the LLM step with these new codes, but at least we'll have them for next time.
  • [ ] Question: given that cognitive_overload doesn't currently exist, does it mean that the code itself will not be assigned to any responses? Is there a way to auto-assign it to any response that has one of its children codes?
  • [ ] Add the following as children of educational_issues if not already present: tough_learning_curve, lack_of_documentation, lack_of_best_practices, lack_of_good_examples, lack_of_knowledge, conflicting_advice
  • [ ] Fold platforms & vendors under Interop (Interop > Apple) and nest browsers one level less, so that we have Apple > Safari iOS and not Apple > iOS > Safari
  • [ ] Drop CSS as a top-level category
  • [ ] As much as possible, use more explicit ids (embedding -> embedding_issues, decorators -> decorators_issues, etc.) to avoid id collision (also see next item)
  • [ ] UNLESS that is, we want to reference an existing entity. So for example container_queries should become at_container because this is an existing entity that already has metadata associated with it.
  • [ ] it'd be nice to have a way for the YAML export NOT to include entities/tokens that are already in the entities repo, just so I don't have them twice. But I can imagine that would be complicated at this stage…
  • [ ] for the System Capabilities Pain Points question it looks like we have no responses tagged with limitations?
  • [ ] remove command_invokers code since we already have invokers (check with Lea first)

SachaG avatar Nov 25 '25 06:11 SachaG

Question: given that cognitive_overload doesn't currently exist, does it mean that the code itself will not be assigned to any responses? Is there a way to auto-assign it to any response that has one of its children codes?

Not sure if this was a question to yourself, but that's how our system handles hierarchy currently.

Drop CSS as a top-level category

@DmitrySharabin what's our current process for omitting a code from the surveyadmin export? Maybe we should introduce a checkbox column for that.

As much as possible, use more explicit ids (embedding -> embedding_issues, decorators -> decorators_issues, etc.) to avoid id collision (also see next item)

But are these separate things?

UNLESS that is, we want to reference an existing entity. So for example container_queries should become at_container because this is an existing entity that already has metadata associated with it.

This makes me wonder if we could actually feed both lists to an LLM and ask it to flag discrepancies.

LeaVerou avatar Nov 25 '25 17:11 LeaVerou

@DmitrySharabin what's our current process for omitting a code from the surveyadmin export? Maybe we should introduce a checkbox column for that.

It's a matter of adding a code to a list. That list is automatically filtered while typing, so adding new items seems to be fast enough.

Image

Adding a checkbox is not an issue at all, though. However, we might end up with having more than one checkbox: one for codes that need to be excluded, and one for codes that need to be flattened.

DmitrySharabin avatar Nov 25 '25 19:11 DmitrySharabin

Adding a checkbox is not an issue at all, though. However, we might end up with having more than one checkbox: one for codes that need to be excluded, and one for codes that need to be flattened.

That sounds like a select list

LeaVerou avatar Nov 25 '25 19:11 LeaVerou

Adding a checkbox is not an issue at all, though. However, we might end up with having more than one checkbox: one for codes that need to be excluded, and one for codes that need to be flattened.

That sounds like a select list

This is how it's done now. I thought you were talking about adding a new column with checkboxes to the Master codebook table (in that case, we might end up with multiple columns). What am I missing?

DmitrySharabin avatar Nov 25 '25 20:11 DmitrySharabin

As much as possible, use more explicit ids (embedding -> embedding_issues, decorators -> decorators_issues, etc.) to avoid id collision (also see next item)

But are these separate things?

You're right, not always. But for example if we have:

- id: state_management
  name: >
    State Management
  description: >
    Difficulties with managing and preserving UI state (e.g., focus, scroll position, form input values, component state) during DOM manipulations.

Then if in another survey we ask people if e.g. they use state management in their React app, the description will show up as "Difficulties with managing and preserving UI state…etc." which won't make sense in that context.

So either we write the description to be neutral so that it works in any context (dropping the "difficulties with"); or we use two separate ids.

SachaG avatar Nov 26 '25 00:11 SachaG

Also, here are the four default tabs under consideration, along with their description:

Default

Most commonly reported pain points overall.

Limitations

Pain points related to being unable to achieve a goal due to feature or platform limitations.

Browser Support

Pain points related to poor browser support or other browser/platform incompatibilities.

Features

Pain points corresponding to specific web platform features.

SachaG avatar Nov 26 '25 07:11 SachaG

Note: currently in limitations/interop/etc. tabs, raw answers are not being filtered down by limitations/interop/etc.

SachaG avatar Nov 26 '25 22:11 SachaG

  • [ ] high_cognitive_load is the new name for dx but since we're excluding/hiding dx I think we can keep the dx name, and let's just not include dx in the export at all to keep things simple.
  • [ ] this means we also don't need specific_platforms_vendors anymore, let's remove it from the export
  • [ ] none of the safari answers are tagged with apple but all of the chrome answers are tagged with google?
  • [ ] we can also get rid of transitions_animations since we already have animations_issues
  • [ ] get rid of code css
  • [ ] things like popover_api or invokers could be root items instead of being children of interactivity?
  • [ ] Browser Interoperability/Limited Functionality: those questions asks for people to submit specific feature names, but the LLM parsing matches answers with a lot of non-feature codes such as interactivity. Do we include these non-features codes in the dataset? what about in the chart?
  • [ ] I don't love "Environments & Consumers" as a label.
  • [ ] Usage>Other Pain points: only 4 answers tagged as ecosystem, which is weird given that it has e.g. 21 framework_dominance answers (which is its child)

SachaG avatar Dec 03 '25 03:12 SachaG

  • high_cognitive_load is the new name for dx but since we're excluding/hiding dx I think we can keep the dx name, and let's just not include dx in the export at all to keep things simple.
  • this means we also don't need specific_platforms_vendors anymore, let's remove it from the export
  • we can also get rid of transitions_animations since we already have animations_issues
  • get rid of code css

Done! By @LeaVerou's request, also excluded interactivity and system_capabilities.

Image
  • none of the safari answers are tagged with apple but all of the chrome answers are tagged with google?

It's because we renamed Apple to Safari (so, none of the answers will have Apple as an assigned token), though we didn't do the same thing with Google, i.e., we didn't rename it to Chrome (so, it's still one of the assignable tokens). For the same reason, we won't see macos as an assigned token.

Image Image
  • Usage>Other Pain points: only 4 answers tagged as ecosystem, which is weird given that it has e.g. 21 framework_dominance answers (which is its child)

ecosystem is one of the tokens we flatten, so, by design, it won't be added if an answer has any of its children assigned (previously, we did the same thing for dx):

Image

DmitrySharabin avatar Dec 03 '25 15:12 DmitrySharabin

  • I think @SachaG prefers to keep Apple as Apple after all.
  • I don't think it makes sense to scope dx to its children and ignore all its direct answers, but also we can't manually go through its 1.8K direct answers…

The rest are mainly for @SachaG to answer.

Another thought: I just filtered codes to find which ones have no direct answers and it's these:

cognitive_overload ✅ content css_flexbox_layout environments_consumers graphics heading_levels_in_components image_rwd_container_queries ~interactivity~ lack_of_declarative_apis living_standard loading_ux password_input performance_tooling prefers_reduced_motion responsive_svg screen_reader_testing scrollbar_styling ~section_usage~ slotted_elements styling_slotted_descendants ~system_capabilities~ web_components_ssr

I marked those we are already excluding with ~strikethrough~ (did I miss any? Not sure about some of them) and added ✅ to those we explicitly added as new containers. Do we want to exclude the rest? I think it could make sense.

LeaVerou avatar Dec 03 '25 15:12 LeaVerou

ecosystem is one of the tokens we flatten, so, by design, it won't be added if an answer has any of its children assigned (previously, we did the same thing for dx):

At this point I have to confess I don't remember all the back-and-forth we went through so maybe there was a reason to flatten it, but right now it does seem like it'd be useful to have it as parent for all the various framework issues? Or did we explicitly decide to have framework_issues and framework_dominance as root elements? Anyway as long as it's not a bug, it's fine either way.

I don't think it makes sense to scope dx to its children and ignore all its direct answers, but also we can't manually go through its 1.8K direct answers…

I did not fully understand that…

Do we want to exclude the rest? I think it could make sense.

At this point I'd rather not change things too much. I agree with removing interactivity because it doesn't seem to have any children, which would make it a dead end. system_capabilities though does have children so maybe we keep it?

SachaG avatar Dec 03 '25 22:12 SachaG

Remaining issues after third iteration:

Labels

  • [ ] interop issues => browser support
  • [ ] html reuse => code modularization

Sacha

  • [ ] date picker should be nested inside pickers
  • [ ] add mdn links or descriptions to more features so that they appear with the dashed outline
  • [ ] clicking on labels should expand the bar when there's no pre-existing label link
  • [ ] add "click me!" prompt first time we encounter nested bars
  • [ ] why are some non-feature items (wc_css_integration) included in features view? (in html_interoperability_features_features_only)
  • [ ] add catch-all buckets? some secondary tabs don't have content

Lea

  • [ ] write takeaways for pain points + usage questions

Dmitry

  • [ ] re-export duplicate and non-duplicate codes in two separate lists, but including all necessary LLM metadata this time

SachaG avatar Dec 05 '25 07:12 SachaG