ExtractTable-py icon indicating copy to clipboard operation
ExtractTable-py copied to clipboard

Add support for presigned URLs

Open marktgraham opened this issue 2 years ago • 1 comments

There may be cases where ExtractTable requires access to a file within a private repo/bucket (e.g. S3 bucket). It is possible to grant access to private images via the use of presigned urls. For example, images in a private S3 bucket can be accessed via a presigned url of the form:

https://[bucket_name].s3.amazonaws.com/[image_name].png?X-Amz-Algorithm=XXXX-Amz-Credential=AKIA...%2Feu-west-2%2Fs3%2Faws4_request&X-Amz-Date=20230207T103049Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=xxxxxx

ExtractTable supports '.pdf', '.jpeg', '.jpg', '.png', but the check is filepath.lower().endswith(self.__SUPPORTED_EXTENSIONS__) which fails with the following error:

Exception: Failed to get response from ExtractTable API. Exception = Allowed file types are ('.pdf', '.jpeg', '.jpg', '.png')

This is because the url ends with some randomly generated signature, whereas the image itself is a valid image.

The request is for an option to specify to ExtractTable that the url is presigned, and a second option to specify the delimiter which marks the end of the filename and the beginning of the signature.

marktgraham avatar Feb 07 '23 12:02 marktgraham