open-semantic-etl
open-semantic-etl copied to clipboard
Enhanced error handling for plugins
Implement enhanced error handling (fallback plugins and retry) for data enrichment or data analysis plugins:
There should be parameters for each extraction & analysis plugin in the process chain for retry and fallback to alternate plugins using alternate tools or methods.
F.e. despite Apache Tika can not parse a file, the Linux command "file" can find out the content type.
Part done: The ETL tools will print not only HTTP error code but the full error message from Solr if something went wrong while posting data to Solr index for easier debugging of schema or errors.
ETL plugins using microservices / REST-APIs will retry failed connections: https://github.com/opensemanticsearch/open-semantic-etl/issues/84
Error status / message management in own function in etl.py.
Entity extraction by Solr text tagger(s) now with separated error handling for each tagger using this new error_message function, so status & error messages are indexed.
All ETL plugins, which use microservices / HTTP REST-APIs for analysis now waiting for services that are down/not loaded yet by enhanced HTTP exception handling, which additionally provides more detailed error messages.