fulltextsearch
fulltextsearch copied to clipboard
Crawling &Indexing content from external website
Notes:
- This will be managed by a specific app,
- Website/Page will be identified by Sitemap,
- Crawling of a sub-page will be based on a local configuration regarding the allowed numbers of hop if the linked page is hosted locally (same domain), on a different subdomain, or on a completely different domains,
- External content have no id within Nextcloud's database, also the crawling of a single address can returns multiple document.
Tasks:
- [ ] Allowing an app to directly reach the used Search Platform (fulltextsearch_elasticsearch) without passing through the FullTextSearch index table (core)
- [ ] Crawling website and sub-pages (app),
- [ ] Extracting content and meta-data for each page (app),
- [ ] indexing content and meta-data (app),
- [ ] Searching and Advanced Searching within content and metadata (core+app),
- [ ] Result should link to the right page (app, might need some work in core to force opening on a different tab)