distributed-wikipedia-mirror icon indicating copy to clipboard operation
distributed-wikipedia-mirror copied to clipboard

How to search for articles?

Open SeanPedersen opened this issue 3 years ago • 9 comments

I am wondering how to use ipns://en.wikipedia-on-ipfs.org/wiki/ effectively? I see no option to search for an article. How am I supposed to find the content I am looking for?

SeanPedersen avatar Jan 31 '21 13:01 SeanPedersen

Search isn't directly supported, but theoretically could be in the future.

One options is to search for site:en.wikipedia-on-ipfs.org SEARCH TERMS in your preferred search engine to discover new pages:

  • https://www.google.com/search?q=site%3Aen.wikipedia-on-ipfs.org+Vincent+van+Gogh
  • https://duckduckgo.com/?q=site%3Aen.wikipedia-on-ipfs.org+Thomas+Jefferson

mburns avatar Feb 11 '21 00:02 mburns

We have some prior art in #44 Code is 4 year old but could be a good starting point if someone has bandwidth to help with this.

lidel avatar Feb 18 '21 21:02 lidel

In case somoneone wants to pick this up before I have spare bandwidth: simply re-use existing UI from mobile Wikipedia: https://en.m.wikipedia.org/, which already has subtle branding + search box:

2021-03-10--13-22-22

Hamburger menu could be replaced with our icon, and clicking on it would jump to the footer explaining the mirror project.

lidel avatar Mar 10 '21 12:03 lidel

Both Google and DDG have methods of adding a custom website search bar to your website:

https://cse.google.com/ https://duckduckgo.com/search_box

I tested both out and unfortunately, the results I'm getting with DDG are all 404 errors because it's putting .html at the end of URLs. If you want to try both out, here are some links:

https://cse.google.com/cse?cx=230751f5750677644 https://duckduckgo.com/search.html?site=en.wikipedia-on-ipfs.org&prefill=Search%20Wikipedia%20on%20IPFS

EDIT: It's also worth noting that both engines do have ads above actual results. It is possible to remove ads (and branding) on DDG with URL params, but it's against ToS unless used for personal use.

RuiNtD avatar May 28 '21 20:05 RuiNtD

There has recently been some work on hosting a full-text search engine in WebAssembly for very large data sets. This was directly influenced by IPFS's hosting of Wikipedia.

The key feature is to pull only the data needed from the static index to the client to execute the search. For example, doing a full text search on an index of size 14 GByte takes 2 seconds, and only needs to download only ~1.5MByte of the index.

Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust. It was initially designed to run on a server, but Rust can run in WebAssembly. There is a pull request https://github.com/tantivy-search/tantivy/pull/1067 that adapts Tantivy to running fully in WebAssembly on the client side.

See the pull request for a demo using the Wikipedia dataset.

ngbrown avatar May 29 '21 14:05 ngbrown

(tantivy creatomaintainer and quickwit CEO) quickwit (https://github.com/quickwit-inc/quickwit) aims precisely at allowing client-side search on a distant high latency storage. We are in the process of opensourcing our code under the AGPL license. Once this is done. We'd be happy to help.

fulmicoton avatar Jun 03 '21 03:06 fulmicoton

Speaking of which, what is the code of distributed-wikipedia-mirror licensed under? Because if it isn't GPLv3 too it won't be able to use quickwit

unbeatable-101 avatar Jun 03 '21 03:06 unbeatable-101

This shows how it can be done with a static sqlite database that serves as the index. Sqlite supports full text search. Sqlite static hosted

johnsonjsyuen avatar Jun 05 '21 01:06 johnsonjsyuen

In Brave Browser, you can create a keyboard shortcut for text that will prefix whatever you type after activating said keyboard shortcut, which can be used to search for IPFS Wikipedia pages. In Brave, go to "Settings > Search engine > Manage search engines and site search > Add", which will prompt you with a dialog box to add a search engine. For example, if you want to use Brave's search engine to search for IPFS Wikipedia pages, you can input https://search.brave.com/search?q=site%3Aen.wikipedia-on-ipfs.org %s for the URL with %s in place of query field (and whatever you want for the Search engine and Shortcut fields).

DmitriyShepelev avatar Sep 11 '22 21:09 DmitriyShepelev