aws-genai-llm-chatbot icon indicating copy to clipboard operation
aws-genai-llm-chatbot copied to clipboard

feat: enable web scraping to parse and save pdf content

Open Rob-Powell opened this issue 9 months ago • 0 comments

Issue #, if available:

Description of changes: Ability to now have the webcrawler crawl and parse PDFs in addition to the existing capability to crawl text/html files.

This change adds a content types parameter to the gui to enable users to decide whether they want only text/html content scraped or if they want to also include 'application/pdf' files as well. This feature also makes it easier to add other content types in the future if desired.

Additionally I also bumped and tested the pydantic versions as per dependabot as I was here and testing this code anyway.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Rob-Powell avatar Apr 25 '24 07:04 Rob-Powell