LLM Data Import
Some notes about importing normalization data from LLMs (cc @LeaVerou and @DmitrySharabin).
ID Specificity
We should avoid broad ids such as desktop or images because they are quite vague once taken out of the context of pain points. Instead, desktop_issues and images_issues make it clear that these are pain points.
ID Collision
Overly broad ids also run the risk of colliding with existing entities/tokens (https://github.com/Devographics/entities). For example firefox is already assigned to the Firefox browser itself. So when referring to e.g. "issues with Firefox", firefox_issues would be better.
ID Continuity
There is also a backwards-compatibility aspect where if for example we've used interoperability_issues up to now, it'd be better to avoid switching to e.g. interop when referring to the same concept.
Filtering Codes/Tokens
To make dealing with these issues easier we can probably leave out any code with less than 10 matching items from the YAML exports, since these won't show up in the survey results anyway.
~~Common vs Question-Specific Codes/Tokens~~
EDIT: unified file is probbaly better after all
We don't want to repeat the same codes/tokens across e.g. forms_pain_points.yml, graphics_pain_points.yml, etc. So it'd be good to factor our any common codes into their own common_pain_points.yml file.
Note: this can be solved by using a single file for all pain points for a survey, but the downside is that this removes the ability to define that a set of pain points should only apply to question Foo, while another set only applies to question Bar, and so on.
Generic Codes
It might be better to leave out generic codes such as limitations or tooling altogether since they might make the resulting charts less specific.
ID Formatting
Maybe it's a one-off thing but I noticed an id named end-user_ux. It'd be good to avoid - and only use _ in IDs to be safe.
Test Case: Performance Pain Points
- I commented out the
matchTagsarray in the questions outline and also renamedperformance_pain_points.ymlfrom the entities repo toperformance_pain_points_disabled.ymlto make sure no regex at all are applied to the question's dataset. - I then clicked "Normalize All" on the question dashboard to trigger the normalization process and wipe out any existing code assignments.
- I pasted in the contents of the JSON export into the Import dialog.
- This found 115 new tokens
- At this point the assignments exist as custom normalizations in the database but are not yet part of the final dataset. So I run "Normalize All" again.
- We're now at 98% normalized! Thanks ChatGPT! (and LeaGPT ;)
- We can get a preview of the resulting data by clicking "Results":
- And we can also now build the results site to see the chart:
Next Steps
- Although we have a
disallowedTokenIdsproperty to exclude codes/tokens from a specific question, it currently only applies to regexes. So we might need another approach to "turn off" theperformancetoken here. - Figure out nesting
Typo:
envirnoments_consumers
cc @DmitrySharabin)
Test Case Part II: Nesting
This is what I get by using the YAML file as is. It'll need tweaking but it's working at least!
Typo:
envirnoments_consumers
Fixed:
Issue
- Tokens may sometimes logically belong to more than one parent.
Example
I hate that quantum values in animations don't work in IE 6!
- Matches
interop,animations, andquantum_values - If we only nest
quantum_valuesunderanimationsand notinterop, theninteropbucket cannot be drilled down into, which leads to a boring chart.
Proposed Solution
- do not assign "combination tokens", such as
interop_anchor_positioning - add support for multiple
parentIdsinstead of a single one - when nesting, nest under multiple parents – but only if the item has the parent's token
- for features, define an API for bulk
parentIdsassignment based on the existing tags - find an elegant way to show that a token can have multiple ancestor hierarchies in the breadcrumbs view
Update: Why it wouldn't work
The above solution presupposes that we know when token A appears along token B. This is true when looking at the raw, answer-by-answer data, but this is not what the chart is built on. Instead, the chart uses a compiled form of the data with total counts for each tokens:
- id: interop
count: 122
- id: animations
count: 89
- id: anchor_positioning
count: 22
- id: quantum_values
count: 13
...
We then separately define that quantum_values is nested under animations. But we can't know how many of those 13 quantum_values items also happen to have the interop token.
New Solution
We resign ourselves to "boring" buckets that can't be drilled down into in the main chart, but add additional filtered views to complement it:
Here's some observations about all the charts we have currently.
Forms Pain Points
- Limitations at # 1 is a dead end since it can't be drilled down into
- DX at # 2 can be drilled down into, but DX by itself has 555 answers while its children only total 118 answers combined. This means that either
- A) there are a lot of DX items that don't get assigned a finer-grained child token.
- B) DX itself gets assigned too easily to non-DX items (which I think is the case here, looking at the detail of the answers)
- Because
end-user_experienceuses a-, that triggers a GraphQL bug that makes it impossible to view its raw answers. quality(low quality results) seems vague and unrelated torendering- I think
framework_compatibilitycould be moved up one level to live at the root without being a child ofecosystem. - "Specific platforms & vendors" seems like it should maybe be a child of
interop?
Graphics & Multimedia Pain Points
- Again big delta between total DX items and sum of its child buckets.
canvasis an example where a feature id in the provided YAML doesn't match the feature's existingid, which should becanvas_element. This is why the label is not auto-linked to the MDN page, CanIUse, etc.performance,video,animationsare all good, easy-to-understand top-level categories.- Not sure why we have
safariby itself and not underspecific_platforms_vendorsorinterop. - Again
qualityseems out of place here.
Content Pain Points
- Same issues as before with
limitationsanddx
Interactivity Pain Points
limitations/dxissues- Should we group CSS features together or leave them standalone at the root level? We're not grouping non-CSS features together, so maybe we shouldn't group CSS features either for consistency?
- Should
xsltbe nested under CSS?
Performance Pain Points
Web Components Other Pain Points
System Capabilities Pain Points
- Again, Google/Mozilla/Firefox should be nested under "specific platform & vendors", or maybe
interop.
Overall Conclusions
- Get rid of
limitations - create a new more restrictive
dx(namedcognitive_load?), so that Σ(dx) is closer to Σ(children of dx). - Rename
dxto something more descriptive of what it actually is matching - Get rid of
ecosystemand move its children to the root level - Make sure all ids match previously-defined ids (e.g.
canvasvscanvas_element) - Nest things under
interop(specific browsers, platforms, etc.) - Maybe: rename
interoptobrowser_supportfor consistency with previous years? - Get rid of
quality("low quality results") - Double-check all ids for disallowed characters (
-)
Double-check Apple/Safari nesting
- interop issues -> browser support?
- styling vs styling & customization in Forms chart
- have button to nest/un-nest items?
Tabs
- default tab
- limitations
- browser_support
- styling?
Note: do not show limitations/browser_support in secondary tabs
Recap
- hide limitations from main view (dead end)
- hide interop from main view (maybe?)
- get rid of dx in general
- create new
cognitive_overloadcategory that is the parent of the currentdxchildren
Notes from today's meeting:
Reduce number of codes:
- Flatten certain codes (e.g. Apple → Safari)
- Fold platforms & vendors under Interop
- Add cognitive_overload as a child of dx, and make all of dx's children be children of cognitive_overload except id_management, which should be in the root (or is there a better parent?)
- Drop CSS as a top-level category
- ...?
UI:
- Move circles with +N to the right for better proximity
- Hide items with <10 answers
- Hide Limitations from Interop view and vice versa.
- Do we want a Styling view?
Blockers:
- Figure out what to do with overly general bars
- Eliminate the huge "unsorted" bars ("View more"?)
- We need to have a flat view, otherwise you can't see the rank of an item
To-do recap for next round of exports:
- [ ] create new
cognitive_overloadcategory that is the parent of the current dx children but does not belong todx - [ ] also add to it the following codes:
speed_of_change,too_many_choices,reinventing_the_wheel,version_updates,excessive_complexity,hard_to_memorize,excessive_verbosityif not already present. Note: it's fine if we can't re-run the LLM step with these new codes, but at least we'll have them for next time. - [ ] Question: given that
cognitive_overloaddoesn't currently exist, does it mean that the code itself will not be assigned to any responses? Is there a way to auto-assign it to any response that has one of its children codes? - [ ] Add the following as children of
educational_issuesif not already present:tough_learning_curve,lack_of_documentation,lack_of_best_practices,lack_of_good_examples,lack_of_knowledge,conflicting_advice - [ ] Fold platforms & vendors under Interop (
Interop > Apple) and nest browsers one level less, so that we haveApple > Safari iOSand notApple > iOS > Safari - [ ] Drop CSS as a top-level category
- [ ] As much as possible, use more explicit ids (embedding -> embedding_issues, decorators -> decorators_issues, etc.) to avoid id collision (also see next item)
- [ ] UNLESS that is, we want to reference an existing entity. So for example
container_queriesshould becomeat_containerbecause this is an existing entity that already has metadata associated with it. - [ ] it'd be nice to have a way for the YAML export NOT to include entities/tokens that are already in the entities repo, just so I don't have them twice. But I can imagine that would be complicated at this stage…
- [ ] for the System Capabilities Pain Points question it looks like we have no responses tagged with limitations?
- [ ] remove
command_invokerscode since we already haveinvokers(check with Lea first)
Question: given that cognitive_overload doesn't currently exist, does it mean that the code itself will not be assigned to any responses? Is there a way to auto-assign it to any response that has one of its children codes?
Not sure if this was a question to yourself, but that's how our system handles hierarchy currently.
Drop CSS as a top-level category
@DmitrySharabin what's our current process for omitting a code from the surveyadmin export? Maybe we should introduce a checkbox column for that.
As much as possible, use more explicit ids (embedding -> embedding_issues, decorators -> decorators_issues, etc.) to avoid id collision (also see next item)
But are these separate things?
UNLESS that is, we want to reference an existing entity. So for example container_queries should become at_container because this is an existing entity that already has metadata associated with it.
This makes me wonder if we could actually feed both lists to an LLM and ask it to flag discrepancies.
@DmitrySharabin what's our current process for omitting a code from the surveyadmin export? Maybe we should introduce a checkbox column for that.
It's a matter of adding a code to a list. That list is automatically filtered while typing, so adding new items seems to be fast enough.
Adding a checkbox is not an issue at all, though. However, we might end up with having more than one checkbox: one for codes that need to be excluded, and one for codes that need to be flattened.
Adding a checkbox is not an issue at all, though. However, we might end up with having more than one checkbox: one for codes that need to be excluded, and one for codes that need to be flattened.
That sounds like a select list
Adding a checkbox is not an issue at all, though. However, we might end up with having more than one checkbox: one for codes that need to be excluded, and one for codes that need to be flattened.
That sounds like a select list
This is how it's done now. I thought you were talking about adding a new column with checkboxes to the Master codebook table (in that case, we might end up with multiple columns). What am I missing?
As much as possible, use more explicit ids (embedding -> embedding_issues, decorators -> decorators_issues, etc.) to avoid id collision (also see next item)
But are these separate things?
You're right, not always. But for example if we have:
- id: state_management
name: >
State Management
description: >
Difficulties with managing and preserving UI state (e.g., focus, scroll position, form input values, component state) during DOM manipulations.
Then if in another survey we ask people if e.g. they use state management in their React app, the description will show up as "Difficulties with managing and preserving UI state…etc." which won't make sense in that context.
So either we write the description to be neutral so that it works in any context (dropping the "difficulties with"); or we use two separate ids.
Also, here are the four default tabs under consideration, along with their description:
Default
Most commonly reported pain points overall.
Limitations
Pain points related to being unable to achieve a goal due to feature or platform limitations.
Browser Support
Pain points related to poor browser support or other browser/platform incompatibilities.
Features
Pain points corresponding to specific web platform features.
Note: currently in limitations/interop/etc. tabs, raw answers are not being filtered down by limitations/interop/etc.
- [ ]
high_cognitive_loadis the new name fordxbut since we're excluding/hidingdxI think we can keep thedxname, and let's just not include dx in the export at all to keep things simple. - [ ] this means we also don't need
specific_platforms_vendorsanymore, let's remove it from the export - [ ] none of the safari answers are tagged with apple but all of the chrome answers are tagged with google?
- [ ] we can also get rid of
transitions_animationssince we already haveanimations_issues - [ ] get rid of code
css - [ ] things like
popover_apiorinvokerscould be root items instead of being children ofinteractivity? - [ ] Browser Interoperability/Limited Functionality: those questions asks for people to submit specific feature names, but the LLM parsing matches answers with a lot of non-feature codes such as
interactivity. Do we include these non-features codes in the dataset? what about in the chart? - [ ] I don't love "Environments & Consumers" as a label.
- [ ] Usage>Other Pain points: only 4 answers tagged as
ecosystem, which is weird given that it has e.g. 21framework_dominanceanswers (which is its child)
high_cognitive_loadis the new name fordxbut since we're excluding/hidingdxI think we can keep thedxname, and let's just not include dx in the export at all to keep things simple.- this means we also don't need
specific_platforms_vendorsanymore, let's remove it from the export- we can also get rid of
transitions_animationssince we already haveanimations_issues- get rid of code
css
Done! By @LeaVerou's request, also excluded interactivity and system_capabilities.
- none of the safari answers are tagged with apple but all of the chrome answers are tagged with google?
It's because we renamed Apple to Safari (so, none of the answers will have Apple as an assigned token), though we didn't do the same thing with Google, i.e., we didn't rename it to Chrome (so, it's still one of the assignable tokens). For the same reason, we won't see macos as an assigned token.
- Usage>Other Pain points: only 4 answers tagged as
ecosystem, which is weird given that it has e.g. 21framework_dominanceanswers (which is its child)
ecosystem is one of the tokens we flatten, so, by design, it won't be added if an answer has any of its children assigned (previously, we did the same thing for dx):
- I think @SachaG prefers to keep Apple as Apple after all.
- I don't think it makes sense to scope
dxto its children and ignore all its direct answers, but also we can't manually go through its 1.8K direct answers…
The rest are mainly for @SachaG to answer.
Another thought: I just filtered codes to find which ones have no direct answers and it's these:
cognitive_overload ✅ content css_flexbox_layout environments_consumers graphics heading_levels_in_components image_rwd_container_queries ~interactivity~ lack_of_declarative_apis living_standard loading_ux password_input performance_tooling prefers_reduced_motion responsive_svg screen_reader_testing scrollbar_styling ~section_usage~ slotted_elements styling_slotted_descendants ~system_capabilities~ web_components_ssr
I marked those we are already excluding with ~strikethrough~ (did I miss any? Not sure about some of them) and added ✅ to those we explicitly added as new containers. Do we want to exclude the rest? I think it could make sense.
ecosystem is one of the tokens we flatten, so, by design, it won't be added if an answer has any of its children assigned (previously, we did the same thing for dx):
At this point I have to confess I don't remember all the back-and-forth we went through so maybe there was a reason to flatten it, but right now it does seem like it'd be useful to have it as parent for all the various framework issues? Or did we explicitly decide to have framework_issues and framework_dominance as root elements? Anyway as long as it's not a bug, it's fine either way.
I don't think it makes sense to scope dx to its children and ignore all its direct answers, but also we can't manually go through its 1.8K direct answers…
I did not fully understand that…
Do we want to exclude the rest? I think it could make sense.
At this point I'd rather not change things too much. I agree with removing interactivity because it doesn't seem to have any children, which would make it a dead end. system_capabilities though does have children so maybe we keep it?
Remaining issues after third iteration:
Labels
- [ ] interop issues => browser support
- [ ] html reuse => code modularization
Sacha
- [ ] date picker should be nested inside pickers
- [ ] add mdn links or descriptions to more features so that they appear with the dashed outline
- [ ] clicking on labels should expand the bar when there's no pre-existing label link
- [ ] add "click me!" prompt first time we encounter nested bars
- [ ] why are some non-feature items (
wc_css_integration) included in features view? (inhtml_interoperability_features_features_only) - [ ] add catch-all buckets? some secondary tabs don't have content
Lea
- [ ] write takeaways for pain points + usage questions
Dmitry
- [ ] re-export duplicate and non-duplicate codes in two separate lists, but including all necessary LLM metadata this time