bundler-site icon indicating copy to clipboard operation
bundler-site copied to clipboard

Replace middleman-search with lunr with Algolia DocSearch

Open tnir opened this issue 2 years ago • 13 comments

What was the end-user problem that led to this PR?

As middleman-search (RubyGems, GitHub) has not been maintained for more than 5.5 years (for example https://github.com/manastech/middleman-search/pull/38), bundler-site maintainers/contributors cannot upgrade lunr to the latest without their additional efforts (or other community's efforts than maintainers/contributors of this repo).

Closes #691

What was your diagnosis of the problem?

UI can be replaced by https://github.com/algolia/docsearch or its forks. Backend (current /search/lunr-index.json) can be replaced by Algolia DocSearch with automated crawling or Typesense cloud with manual crawling by us.

What is your fix for the problem, implemented in this PR?

  • 🎉 Removes middleman-search gem and relevant gems (mini_racer and libv8-node).
  • 🎉 Removes JS/CSS codes and Middleman configuration for middleman-search.
  • 🎉 Removes HTML snippets thanks to new UI from DocSeach v3.
  • 🎉 Removes (the load of) Popover CSS from Bootstrap 5 along with the above JS code removal.
  • 🏗️ Also the load of /application.min.js is moved to outside of <head>.

The crawler is manually run by @tnir at any time.

Screenshots

PC (as-is)

bundler-site-tnir-algol-j2zyh5 herokuapp com_v2

PC (searching)

bundler-site-tnir-algol-j2zyh5 herokuapp com_search_v2

Mobile (as is) Mobile (searching)
bundler-site-tnir-algol-j2zyh5 herokuapp com_(iPhone 12 Pro)_v2 bundler-site-tnir-algol-j2zyh5 herokuapp com_(iPhone 12 Pro)_search_v2
Old screenshots for previous changeset

bundler-site-tnir-algol-j2zyh5 herokuapp com_ bundler-site-tnir-algol-j2zyh5 herokuapp com_search

Mobile (as is) Mobile (searching)
bundler-site-tnir-algol-j2zyh5 herokuapp com_(iPhone 12 Pro) bundler-site-tnir-algol-j2zyh5 herokuapp com_(iPhone 12 Pro)_search

Checklist

  • [x] --docsearch-searchbox-background is now #ebedf0 (default), which is away from "Bunlder color scheme", which does not exist 😁
  • [x] (optional) searchbox has 750px-breakpoint for mobile while the current site powered by Bootstrap 5 has 768px for navbar.
    • cf. https://github.com/algolia/docsearch/issues/1444
      • was addressed in https://github.com/algolia/docsearch/pull/1446

Why did you choose this fix out of the possible options?

Algolia's DocSearch for open source projects gets ready in a few days. If Typesense offers

  • a some tiny instance to us at no cost
  • and the permission to the UI
  • DocSearch v3-ready UI for enduser,

we might be able to use #702, but this change looks clear at this moment.

Signed-off-by: Takuya Noguchi [email protected]

tnir avatar Jul 18 '22 13:07 tnir

Hey, small feedback to improve your DocSearch results, you can update the Crawler config to better scope the lvl0 (e.g. h1), so you have more sections appearing in the modal (like on https://docsearch.algolia.com/)

shortcuts avatar Jul 18 '22 21:07 shortcuts

@shortcuts Thanks, but I could not see any good element for lvl0 at this moment. I planned to update them by just adding section name after information architecting 😁 What do you think?

tnir avatar Jul 19 '22 09:07 tnir

@shortcuts Thanks, but I could not see any good element for lvl0 at this moment. I planned to update them by just adding section name after information architecting 😁 What do you think?

The main title of the page, the navbar section/active subsection is what we recommend

shortcuts avatar Jul 19 '22 09:07 shortcuts

Is the search branded right now? That would be big :-1: for me.

simi avatar Jul 19 '22 10:07 simi

@simi Yes I see you, but Typesense will cause the same result...

tnir avatar Jul 19 '22 10:07 tnir

@simi As the next step to this PR, we can prepare our own cluster on Algolia (or Typesense cloud). We can estimate how much it will be if some sponsors pay for it.

If this is not possible for you, you're free to open your own Algolia account and run DocSearch on your own without this limitation. In that case, though, depending on the size of your documentation, you might need a paid account (free accounts can hold as much as 10k records).

https://docsearch.algolia.com/docs/DocSearch-program

tnir avatar Jul 19 '22 11:07 tnir

In any case, it seems best for us to be in full control of when we run the crawler (after every deploy), so running our own cluster would seem like the way to go.

deivid-rodriguez avatar Jul 19 '22 11:07 deivid-rodriguez

Is it possible to recrawl the website on every deploy instead of daily?

Sure, we have a GitHub action available: https://github.com/algolia/algoliasearch-crawler-github-actions (which works with DocSearch :))

shortcuts avatar Jul 19 '22 11:07 shortcuts

In any case, it seems best for us to be in full control of when we run the crawler (after every deploy), so running our own cluster would seem like the way to go.

Is it possible to recrawl the website on every deploy instead of daily?

My previous comment is at least now incorrect (my memory might be from years ago...). As described in the doc below, we can schedule it more frequently and trigger it manually. Not sure we can do it via API/CLI though:

Crawls are scheduled at a random time once a week. You can configure this schedule from the config file or trigger one manually from the Crawler interface.

https://www.algolia.com/doc/tools/crawler/apis/configuration/schedule/ still says every 24 hours: schedule: 'every 1 day at 3:00 pm',

While testing I noticed some extra search results which should not be needed. For example, when looking for "docker", one of the results is a link to the sidebar: https://bundler.io/guides/bundler_docker_guide.html#sidebar-wrapper. Can we avoid generating these redundant results?

Crawler configuration can be modified at https://crawler.algolia.com/admin/crawlers/7f4b6579-bba3-4c1f-a0ff-462d9f408281/configuration/edit (I sent an invitation to you). Just hours ago, I updated it as follows:

new Crawler({
  rateLimit: 8,
  startUrls: ["https://bundler.io"],
  renderJavaScript: false,
  sitemaps: [],
  ignoreCanonicalTo: false,
  discoveryPatterns: ["https://bundler.io/**"],
  schedule: "at 5:44 PM on Sunday",
  actions: [
    {
      indexName: "bundler",
      pathsToMatch: ["https://bundler.io/**"],
      recordExtractor: ({ helpers }) => {
        return helpers.docsearch({
          recordProps: {
            lvl1: [
              ".commands h2#NAME",
              "h1",
              "#page-content-wrapper h2",
              "head > title",
            ],
            content: [
              "#page-content-wrapper p, #page-content-wrapper li",
              "p",
              ".team .card-body",
              "span.contributor",
            ],
            lvl0: {
              selectors: "",
              defaultValue: "Documentation",
            },
            lvl2: ["h3"],
            lvl3: ["h4"],
            lvl4: ["h5"],
            lvl5: ["article h5", "main h5", "h5"],
            lvl6: ["article h6", "main h6", "h6"],
          },
          aggregateContent: true,
          recordVersion: "v3",
        });
      },
    },
  ],
  initialIndexSettings: {
    bundler: {
      attributesForFaceting: ["type", "lang"],
      attributesToRetrieve: [
        "hierarchy",
        "content",
        "anchor",
        "url",
        "url_without_anchor",
        "type",
      ],
      attributesToHighlight: ["hierarchy", "content"],
      attributesToSnippet: ["content:10"],
      camelCaseAttributes: ["hierarchy", "content"],
      searchableAttributes: [
        "unordered(hierarchy.lvl0)",
        "unordered(hierarchy.lvl1)",
        "unordered(hierarchy.lvl2)",
        "unordered(hierarchy.lvl3)",
        "unordered(hierarchy.lvl4)",
        "unordered(hierarchy.lvl5)",
        "unordered(hierarchy.lvl6)",
        "content",
      ],
      distinct: true,
      attributeForDistinct: "url",
      customRanking: [
        "desc(weight.pageRank)",
        "desc(weight.level)",
        "asc(weight.position)",
      ],
      ranking: [
        "words",
        "filters",
        "typo",
        "attribute",
        "proximity",
        "exact",
        "custom",
      ],
      highlightPreTag: '<span class="algolia-docsearch-suggestion--highlight">',
      highlightPostTag: "</span>",
      minWordSizefor1Typo: 3,
      minWordSizefor2Typos: 7,
      allowTyposOnNumericTokens: false,
      minProximity: 1,
      ignorePlurals: true,
      advancedSyntax: true,
      attributeCriteriaComputedByMinProximity: true,
      removeWordsIfNoResults: "allOptional",
    },
  },
  appId: "3JA5LRH987",
  apiKey: "masked",
});

tnir avatar Jul 19 '22 12:07 tnir

Updated the primary color for box-shadow etc. in searchbox and modal as I have forgotten the adaption of Bundler branding color.

tnir avatar Jul 19 '22 12:07 tnir

Admin role is now requested to Org members/Repo admins in #730 to set up periodic indexing on Actions 🙏

tnir avatar Jul 22 '22 04:07 tnir

@deivid-rodriguez How can I trigger a review app for this (old) PR (just for development purpose)?

tnir avatar Jul 27 '22 09:07 tnir

I do it manually from Heroku UI, I created it for you.

deivid-rodriguez avatar Jul 29 '22 06:07 deivid-rodriguez