bundler-site
bundler-site copied to clipboard
Replace middleman-search with lunr with Algolia DocSearch
What was the end-user problem that led to this PR?
As middleman-search (RubyGems, GitHub) has not been maintained for more than 5.5 years (for example https://github.com/manastech/middleman-search/pull/38), bundler-site maintainers/contributors cannot upgrade lunr to the latest without their additional efforts (or other community's efforts than maintainers/contributors of this repo).
Closes #691
What was your diagnosis of the problem?
UI can be replaced by https://github.com/algolia/docsearch or its forks. Backend (current /search/lunr-index.json
) can be replaced by Algolia DocSearch with automated crawling or Typesense cloud with manual crawling by us.
What is your fix for the problem, implemented in this PR?
- 🎉 Removes middleman-search gem and relevant gems (mini_racer and libv8-node).
- 🎉 Removes JS/CSS codes and Middleman configuration for middleman-search.
- 🎉 Removes HTML snippets thanks to new UI from DocSeach v3.
- 🎉 Removes (the load of) Popover CSS from Bootstrap 5 along with the above JS code removal.
- 🏗️ Also the load of
/application.min.js
is moved to outside of<head>
.
The crawler is manually run by @tnir at any time.
Screenshots
PC (as-is)
PC (searching)
Mobile (as is) | Mobile (searching) |
---|---|
![]() |
![]() |
Old screenshots for previous changeset
Mobile (as is) | Mobile (searching) |
---|---|
![]() |
![]() |
Checklist
- [x]
--docsearch-searchbox-background
is now #ebedf0 (default), which is away from "Bunlder color scheme", which does not exist 😁 - [x] (optional) searchbox has
750px
-breakpoint for mobile while the current site powered by Bootstrap 5 has 768px for navbar.- cf. https://github.com/algolia/docsearch/issues/1444
- was addressed in https://github.com/algolia/docsearch/pull/1446
- cf. https://github.com/algolia/docsearch/issues/1444
Why did you choose this fix out of the possible options?
Algolia's DocSearch for open source projects gets ready in a few days. If Typesense offers
- a some tiny instance to us at no cost
- and the permission to the UI
- DocSearch v3-ready UI for enduser,
we might be able to use #702, but this change looks clear at this moment.
Signed-off-by: Takuya Noguchi [email protected]
Hey, small feedback to improve your DocSearch results, you can update the Crawler config to better scope the lvl0
(e.g. h1
), so you have more sections appearing in the modal (like on https://docsearch.algolia.com/)
@shortcuts Thanks, but I could not see any good element for lvl0
at this moment. I planned to update them by just adding section name after information architecting 😁 What do you think?
@shortcuts Thanks, but I could not see any good element for
lvl0
at this moment. I planned to update them by just adding section name after information architecting 😁 What do you think?
The main title of the page, the navbar section/active subsection is what we recommend
Is the search branded right now? That would be big :-1: for me.
@simi Yes I see you, but Typesense will cause the same result...
@simi As the next step to this PR, we can prepare our own cluster on Algolia (or Typesense cloud). We can estimate how much it will be if some sponsors pay for it.
If this is not possible for you, you're free to open your own Algolia account and run DocSearch on your own without this limitation. In that case, though, depending on the size of your documentation, you might need a paid account (free accounts can hold as much as 10k records).
https://docsearch.algolia.com/docs/DocSearch-program
In any case, it seems best for us to be in full control of when we run the crawler (after every deploy), so running our own cluster would seem like the way to go.
Is it possible to recrawl the website on every deploy instead of daily?
Sure, we have a GitHub action available: https://github.com/algolia/algoliasearch-crawler-github-actions (which works with DocSearch :))
In any case, it seems best for us to be in full control of when we run the crawler (after every deploy), so running our own cluster would seem like the way to go.
Is it possible to recrawl the website on every deploy instead of daily?
My previous comment is at least now incorrect (my memory might be from years ago...). As described in the doc below, we can schedule it more frequently and trigger it manually. Not sure we can do it via API/CLI though:
Crawls are scheduled at a random time once a week. You can configure this schedule from the config file or trigger one manually from the Crawler interface.
https://www.algolia.com/doc/tools/crawler/apis/configuration/schedule/ still says every 24 hours: schedule: 'every 1 day at 3:00 pm',
While testing I noticed some extra search results which should not be needed. For example, when looking for "docker", one of the results is a link to the sidebar: https://bundler.io/guides/bundler_docker_guide.html#sidebar-wrapper. Can we avoid generating these redundant results?
Crawler configuration can be modified at https://crawler.algolia.com/admin/crawlers/7f4b6579-bba3-4c1f-a0ff-462d9f408281/configuration/edit (I sent an invitation to you). Just hours ago, I updated it as follows:
new Crawler({
rateLimit: 8,
startUrls: ["https://bundler.io"],
renderJavaScript: false,
sitemaps: [],
ignoreCanonicalTo: false,
discoveryPatterns: ["https://bundler.io/**"],
schedule: "at 5:44 PM on Sunday",
actions: [
{
indexName: "bundler",
pathsToMatch: ["https://bundler.io/**"],
recordExtractor: ({ helpers }) => {
return helpers.docsearch({
recordProps: {
lvl1: [
".commands h2#NAME",
"h1",
"#page-content-wrapper h2",
"head > title",
],
content: [
"#page-content-wrapper p, #page-content-wrapper li",
"p",
".team .card-body",
"span.contributor",
],
lvl0: {
selectors: "",
defaultValue: "Documentation",
},
lvl2: ["h3"],
lvl3: ["h4"],
lvl4: ["h5"],
lvl5: ["article h5", "main h5", "h5"],
lvl6: ["article h6", "main h6", "h6"],
},
aggregateContent: true,
recordVersion: "v3",
});
},
},
],
initialIndexSettings: {
bundler: {
attributesForFaceting: ["type", "lang"],
attributesToRetrieve: [
"hierarchy",
"content",
"anchor",
"url",
"url_without_anchor",
"type",
],
attributesToHighlight: ["hierarchy", "content"],
attributesToSnippet: ["content:10"],
camelCaseAttributes: ["hierarchy", "content"],
searchableAttributes: [
"unordered(hierarchy.lvl0)",
"unordered(hierarchy.lvl1)",
"unordered(hierarchy.lvl2)",
"unordered(hierarchy.lvl3)",
"unordered(hierarchy.lvl4)",
"unordered(hierarchy.lvl5)",
"unordered(hierarchy.lvl6)",
"content",
],
distinct: true,
attributeForDistinct: "url",
customRanking: [
"desc(weight.pageRank)",
"desc(weight.level)",
"asc(weight.position)",
],
ranking: [
"words",
"filters",
"typo",
"attribute",
"proximity",
"exact",
"custom",
],
highlightPreTag: '<span class="algolia-docsearch-suggestion--highlight">',
highlightPostTag: "</span>",
minWordSizefor1Typo: 3,
minWordSizefor2Typos: 7,
allowTyposOnNumericTokens: false,
minProximity: 1,
ignorePlurals: true,
advancedSyntax: true,
attributeCriteriaComputedByMinProximity: true,
removeWordsIfNoResults: "allOptional",
},
},
appId: "3JA5LRH987",
apiKey: "masked",
});
Updated the primary color for box-shadow etc. in searchbox and modal as I have forgotten the adaption of Bundler branding color.
Admin role is now requested to Org members/Repo admins in #730 to set up periodic indexing on Actions 🙏
@deivid-rodriguez How can I trigger a review app for this (old) PR (just for development purpose)?
I do it manually from Heroku UI, I created it for you.