esphome-docs
esphome-docs copied to clipboard
Add embedding search
Description:
-
Shows 3 recommended pages when you start typing in the search box
-
Fast (also doesn't require hitting "go"), better at finding some pages (light -> light component is first result, internet -> shows internet-related components), although doesn't replace the current stuff
-
Uses glove-based embeddings, shipping directly to the browser, no 3rd party. The current search index is ~250kb compressed; this index is ~300kb compressed, including a number of words that might be typed.
- Too big? Let me know and I'll try to ship less
- Only should load data upon using the search bar? Let me know and I'll implement that
Merge first: https://github.com/esphome/esphome-docs/pull/3773 Related issue (if applicable): N/A Pull request in esphome with YAML changes (if applicable): N/A
Checklist:
- N/A I am merging into
next
because this is new documentation that has a matching pull-request in esphome as linked above.
or - [x] I am merging into
current
because this is a fix, change and/or adjustment in the current documentation and is not for a new component or feature. - N/A Link added in
/index.rst
when creating new documents for new components or cookbook.
Deploy Preview for esphome ready!
Name | Link |
---|---|
Latest commit | 07dd59f8bcde884833875919a86d2470ff7142af |
Latest deploy log | https://app.netlify.com/sites/esphome/deploys/663af3c1cd036b00083000a4 |
Deploy Preview | https://deploy-preview-3774--esphome.netlify.app |
Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site configuration.
A few quick thoughts:
- remove changelogs from the search index
- some kind of mapping between
I²C
-I2C
, as people won't be filling the search box with superscripts, but would like to get the results - maybe prioritize title results to appear first, and text body result appear lower
I had some of the same thoughts, but found out I couldn't implement them.
Right now, I'm only searching on the titles. This isn't ideal, but I'm using a simple weighted average to turn word embeddings into phrase embeddings, so including the body might drown out the title. Plus, the fact that we're using embeddings means that related terms will still work.
It turns out Glove doesn't have a word embedding for i2c. I should see if the original tokenizer was tokenizing it differently.
Try "graph". Changelogs in the results are really annoying. Graph Component by itself doesn't appear in the list.
Maybe it would worth to take ref instances into account separately.
Because the graph component isn't its own page, it doesn't show up as a separate result in either this search or the default Sphinx search. (It would be nice if the subheaders like "light effects", "graph component", "pin" showed up, but that would probably require hardcoding or something) You are using the embedding search (the one that shows while you're typing) though, right? I can't get changelogs to show up by searching for "graph" with it.
Maybe it would worth to take ref instances into account separately.
That sounds interesting, but I'm not sure what that would look like in practice. Could you elaborate?
I just type in what I search for, and if there's nothing relevant showing up while typing, I press the button.
I would expect to have a more detailed set of relevant results, without changelog entries, etc., based on similar criteria as the popups I got while typing.
In my mind the two are one.
I understand where you're coming from, you would expect them to act similarly. However, this PR is just an incremental change, only adding a search while typing; the more detailed search is exactly the same to the one currently on https://esphome.io/. I could hook into Sphinx's search logic, and add some custom filtering and sorting, but I would prefer to just edit the search while typing.
I understand that. But users will see is holistically, it would really help to have a consistent experience.
I think anything to improve searching esphome would be beneficial. With this PR the search doesn’t appear to produce the results I’d expect. With a search that is based on the title then I’d expect “abp” (either lowercase or caps) to show the Honeywell ABP and ABP2 components. Yet it shows neither one. “Honeywell” finds both. “Pressure” finds neither and only lists one component with the word “pressure” in the title.
Looks like if I increase the embedding dimension to 50 the performance gets better. I'll have to see how I can increase the dimensions while not shipping too many word embeddings.
It’s somewhat better, yet “Pressure” still doesn’t show 3 components whose titles include the word pressure.
@RubyBailey for some reason one component is using "co_" (which we have no embeddings for) instead of "co", making it embed badly. optimally it would just use "co", but i just turned the "co_" token into the "co" token, which fixes the problem