fulltextsearch icon indicating copy to clipboard operation
fulltextsearch copied to clipboard

Crawling &Indexing content from external website

Open ArtificialOwl opened this issue 3 years ago • 0 comments

Notes:

  • This will be managed by a specific app,
  • Website/Page will be identified by Sitemap,
  • Crawling of a sub-page will be based on a local configuration regarding the allowed numbers of hop if the linked page is hosted locally (same domain), on a different subdomain, or on a completely different domains,
  • External content have no id within Nextcloud's database, also the crawling of a single address can returns multiple document.

Tasks:

  • [ ] Allowing an app to directly reach the used Search Platform (fulltextsearch_elasticsearch) without passing through the FullTextSearch index table (core)
  • [ ] Crawling website and sub-pages (app),
  • [ ] Extracting content and meta-data for each page (app),
  • [ ] indexing content and meta-data (app),
  • [ ] Searching and Advanced Searching within content and metadata (core+app),
  • [ ] Result should link to the right page (app, might need some work in core to force opening on a different tab)

ArtificialOwl avatar Jan 19 '22 12:01 ArtificialOwl