haystack
haystack copied to clipboard
refactor: Update WebRetriever to use LinkContentRetriever
What?
This pull request proposes simplifying the existing WebRetriever
component by internally utilizing the recently isolated LinkContentRetriever
(see #5227 for more details).
Why?
After successfully extracting LinkContentRetriever
from the existing code base, it's now appropriate to utilize this standalone component within WebRetriever
as well. This refactoring not only improves the structure of our code but also enhances maintainability and readability.
How can it be used?
The use of WebRetriever
remains unaffected by this change. The refactoring does not alter any public APIs, so we can continue to use WebRetriever
as we did previously.
How did you test it?
The refactoring is validated by updating the existing unit tests and adding new comprehensive ones to ensure the robustness of the WebRetriever post-refactoring.
WebRetriever was also manually tested with examples/web_qa.py
and examples/web_lfqa.py
Notes for the reviewer:
PLEASE DO NOT MERGE this pull request at this stage.
The PR size will shrink significantly once #5227 is merged. It now contains the changes from #5227 as well.
Before proceeding, we need first to merge PR #5227 and then rebase this PR on it. In the meantime, you can use this PR to familiarize yourself with the changes.
While reviewing, please focus on the changes within WebRetriever
and how it utilizes LinkContentRetriever
. The public APIs remain unchanged. The refactoring aims to simplify the existing structure, so I would greatly appreciate any insights or suggestions for further simplification.
Pull Request Test Coverage Report for Build 5726065556
- 0 of 0 changed or added relevant lines in 0 files are covered.
- 23 unchanged lines in 2 files lost coverage.
- Overall coverage decreased (-0.03%) to 46.512%
Files with Coverage Reduction | New Missed Lines | % |
---|---|---|
nodes/search_engine/web.py | 3 | 63.89% |
nodes/retriever/web.py | 20 | 78.26% |
<!-- | Total: | 23 |
Totals | |
---|---|
Change from base Build 5725116002: | -0.03% |
Covered Lines: | 10940 |
Relevant Lines: | 23521 |
💛 - Coveralls
@anakin87 Addressed some of your concerns in this update. Let's have another pass when you get a chance. TIA 🙏