langchain icon indicating copy to clipboard operation
langchain copied to clipboard

UnstructuredURLLoader does not gracefully handle failures given a list of URL's

Open cragwolfe opened this issue 1 year ago • 0 comments

As reported by Kranos in Discord, there is no a way to robustly iterate through a list of URL's with UnstructuredURLLoader. The workaround for now is to create a UnstructuredURLLoader object per url and do the following:

Yep, exactly my problem - I had a load of URLs loaded into a pandas dataframe I was iterating through. I basically added the following at the end of the loop to keep things ticking over and ignoring any errors:
# Manage any errors
  except (NameError, ValueError, KeyError, OSError, TypeError):
    # Pass the error
    pass

UnstructuredURLLoader should likely do this by default, or provide a strict option to exit on any failures.

cragwolfe avatar Mar 23 '23 20:03 cragwolfe