erlexec icon indicating copy to clipboard operation
erlexec copied to clipboard

Pipeline hangs when using load_uri_to_image_tensor with large or error images

Open hanxiao opened this issue 3 years ago • 2 comments

Title generated by GPT3 Source: https://jina-ai.slack.com/archives/C0169V26ATY/p1663780772405119 Please consult the original message for more details. I will not follow the thread.


We are using the load_uri_to_image_tensor to load images from various endpoints. I found that for some endpoints, if the image is very large or if I get an error response when trying to urlopen , the whole embedding/indexing pipeline seems to hang. Adding a timeout and a try/except statement to _uri_to_blob helped get around this error (see below) Is there a way to achieve the same thing with the built-in parameters provided by docarray / jina?

def _uri_to_blob(uri: str) -> bytes:
    """Convert uri to blob
    Internally it reads uri into blob.
    :param uri: the uri of Document
    :return: blob bytes.
    """
    if urllib.parse.urlparse(uri).scheme in {'http', 'https', 'data'}:
        try:
            req = urllib.request.Request(uri, headers={'User-Agent': 'Mozilla/5.0'})
            with urllib.request.urlopen(req, timeout=5) as fp:
                return fp.read()
        except Exception as e: 
            raise FileNotFoundError(f'`{uri}`: error pulling')
    elif os.path.exists(uri):
        try:
            with open(uri, 'rb') as fp:
                return fp.read()
        except Exception as e:
            raise FileNotFoundError(f'`{uri}`: error pulling')
    else:
        raise FileNotFoundError(f'`{uri}` is not a URL or a valid local path')

hanxiao avatar Sep 21 '22 19:09 hanxiao