Flowise
Flowise copied to clipboard
[BUG] Relative Links do not get scraped
Describe the bug When using a web scraper as Document loader a lot of relative links are not found.
To Reproduce
- In a Flowise Chatflow
- Use Puppeteer or other web scraper from document loaders and try to scrape a website with relative links.
- Configure base URL to https://docs.readthedocs.io/en/stable/about/index.html
- In
Manage Links
click fetch URLs - Relative link https://docs.readthedocs.io/en/stable/tutorial/index.html is not found
Expected behavior All relative links should be found.
Screenshots If applicable, add screenshots to help explain your problem.
Flow Exported flow to help replicating the problem: Relative Link Repro Chatflow.json
Setup
- Installation [e.g. docker,
npx flowise start
,yarn start
] - Flowise Version 1.5.0
- OS: Docker on Linux tested, but should be the same on any
- Browser Firefox and Chrome tested
Additional context Add any other context about the problem here.
I fixed and tested it on my system.
PR: https://github.com/FlowiseAI/Flowise/pull/1740
With Release-Version 1.5.0 there are some invalid links and no https://docs.readthedocs.io/en/stable/tutorial/index.html found:
With the patched version https://docs.readthedocs.io/en/stable/tutorial/index.html shows up and no invalid links are returned: