pinecone-vercel-starter icon indicating copy to clipboard operation
pinecone-vercel-starter copied to clipboard

Enhanced Document Tracking: Multi-Website Support and Continuous Relevancy Insights

Open HarounAns opened this issue 2 years ago • 2 comments

Problem

In the existing design, visibility into the fetched documents is limited to instances right after the index has been freshly seeded. This approach was restrictive as it only catered to one website at a time, which posed challenges in comprehensive tracking and management. There was no provision to determine the relevancy of documents outside of this immediate post-seeding phase. With the new design, we have broadened our scope by accommodating multiple websites simultaneously. Furthermore, it provides insights into any document deemed relevant, eliminating the constraint of relying solely on the most recently seeded ones.

Solution

Implemented a relevantDocs section that showcases the documents fetched across various websites. This enhancement provides more transparency, ensuring that the user can see which documents are being retrieved, regardless of the recent actions with the crawler.

https://github.com/pinecone-io/pinecone-vercel-starter/assets/20847319/b07ca793-c304-449b-bcce-64991d3b0e7b

^ Notice how the fetched documents come from different websites!

Type of Change

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [ ] This change requires a documentation update
  • [ ] Infrastructure change (CI configs, etc)
  • [ ] Non-code change (docs, etc)
  • [ ] None of the above: (explain here)

Test Plan

  1. Navigate to the new relevantDocs section.
  2. Ensure that the fetched documents across different websites are being displayed.
  3. Test fetching documents after both loading and not loading the index via the crawler.
  4. Confirm that the displayed documents in the relevantDocs section are consistent with the expected results based on the recent actions with the crawler.

HarounAns avatar Sep 24 '23 05:09 HarounAns

@rschwabco please let me know your thoughts

HarounAns avatar Sep 24 '23 16:09 HarounAns

@rschwabco do you feel like this PR has any value. If not I can close it

HarounAns avatar Jan 17 '24 04:01 HarounAns