aws-genai-llm-chatbot
aws-genai-llm-chatbot copied to clipboard
feat: enable web scraping to parse and save pdf content
Issue #, if available:
Description of changes: Ability to now have the webcrawler crawl and parse PDFs in addition to the existing capability to crawl text/html files.
This change adds a content types parameter to the gui to enable users to decide whether they want only text/html content scraped or if they want to also include 'application/pdf' files as well. This feature also makes it easier to add other content types in the future if desired.
Additionally I also bumped and tested the pydantic versions as per dependabot as I was here and testing this code anyway.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.