erlexec
erlexec copied to clipboard
Pipeline hangs when using load_uri_to_image_tensor with large or error images
Title generated by GPT3 Source: https://jina-ai.slack.com/archives/C0169V26ATY/p1663780772405119 Please consult the original message for more details. I will not follow the thread.
We are using the load_uri_to_image_tensor to load images from various endpoints. I found that for some endpoints, if the image is very large or if I get an error response when trying to urlopen , the whole embedding/indexing pipeline seems to hang. Adding a timeout and a try/except statement to _uri_to_blob helped get around this error (see below) Is there a way to achieve the same thing with the built-in parameters provided by docarray / jina?
def _uri_to_blob(uri: str) -> bytes:
"""Convert uri to blob
Internally it reads uri into blob.
:param uri: the uri of Document
:return: blob bytes.
"""
if urllib.parse.urlparse(uri).scheme in {'http', 'https', 'data'}:
try:
req = urllib.request.Request(uri, headers={'User-Agent': 'Mozilla/5.0'})
with urllib.request.urlopen(req, timeout=5) as fp:
return fp.read()
except Exception as e:
raise FileNotFoundError(f'`{uri}`: error pulling')
elif os.path.exists(uri):
try:
with open(uri, 'rb') as fp:
return fp.read()
except Exception as e:
raise FileNotFoundError(f'`{uri}`: error pulling')
else:
raise FileNotFoundError(f'`{uri}` is not a URL or a valid local path')