langchain
langchain copied to clipboard
UnstructuredURLLoader does not gracefully handle failures given a list of URL's
As reported by Kranos in Discord, there is no a way to robustly iterate through a list of URL's with UnstructuredURLLoader. The workaround for now is to create a UnstructuredURLLoader object per url and do the following:
Yep, exactly my problem - I had a load of URLs loaded into a pandas dataframe I was iterating through. I basically added the following at the end of the loop to keep things ticking over and ignoring any errors:
# Manage any errors
except (NameError, ValueError, KeyError, OSError, TypeError):
# Pass the error
pass
UnstructuredURLLoader should likely do this by default, or provide a strict
option to exit on any failures.