learnosm icon indicating copy to clipboard operation
learnosm copied to clipboard

Proper search indexing in-page text

Open harry-wood opened this issue 12 years ago • 5 comments

Currently the search on LearnOSM.org is jquery Autocomplete with only the chapter headings loaded in (More info on this old issue) Not a massive priority, but maybe somebody fancies thinking about how this can improved to match on text within the chapters, which will involve implementing it completely differently.

For bonus points we should note that learnosm.org has "proper" content pages, and also a number of links to google docs at the moment. I guess ideally those might be indexed too.

We could run a proper indexed search system such as "lucene", but I don't know how easy it is to set-up or what would run better on this server.

Serverless solutions including embedding google search or the less evil duckduckgo

harry-wood avatar Dec 10 '13 12:12 harry-wood

Elasticsearch is pretty easy to set up - for example; https://github.com/elasticsearch/cookbook-elasticsearch

http://www.elasticsearchtutorial.com/elasticsearch-in-5-minutes.html#Indexing is how you can use it.

Plugins like https://github.com/codelibs/elasticsearch-river-web make it easy to index sites.

As you say, the quickest solution may be embedded google though

CloCkWeRX avatar Nov 10 '14 16:11 CloCkWeRX

Just made a google custom search engine; under these terms: https://support.google.com/customsearch/answer/1714300

Embed code:

<script>
  (function() {
    var cx = '013113174346174561421:rf_zkfu3yv0';
    var gcse = document.createElement('script');
    gcse.type = 'text/javascript';
    gcse.async = true;
    gcse.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') +
        '//www.google.com/cse/cse.js?cx=' + cx;
    var s = document.getElementsByTagName('script')[0];
    s.parentNode.insertBefore(gcse, s);
  })();
</script>
<gcse:search></gcse:search>

Styling details: https://developers.google.com/custom-search/docs/element

CloCkWeRX avatar Nov 10 '14 16:11 CloCkWeRX

image

image

and results open in a new tab.

CloCkWeRX avatar Nov 10 '14 16:11 CloCkWeRX

One thing to note about search terms and using external 3rd party searching, especially google, is that our new stats software, piwik, might not get search terms information in that case.

bgirardot avatar Feb 12 '15 16:02 bgirardot

Just a comment - current search uses the line from the header 'title' and searches on any words in that title: title: Remote, Armchair or Mapathon editing will get hits on any of the words remote, armchair or mapathon. See also #315 which may have a solution outlined.

Nick-Tallguy avatar Feb 15 '15 19:02 Nick-Tallguy