giftless
giftless copied to clipboard
Make storage adapter API return types consistent across Azure and GCP
It looks like the streaming API for GUnicorn on Heroku does not support the method tell(). ATM there is a workaround where I return the size of the stream; which is not fully compatible with the requested return type of put on the implementation of the provider. What's a good way to abstract over different providers and wsgi servers? Should we revisit the api?
Note: the code below is bad
# google_cloud.py
def put(self, prefix: str, oid: str, data_stream: BinaryIO) -> int:
bucket = self.storage_client.get_bucket(self.bucket_name)
blob = bucket.blob(self._get_blob_path(prefix, oid))
blob.upload_from_string(data_stream.read())
return data_stream.get_size()
@hannelita i'm not sure i totally understand this ...
@shevron is this clear to you? If so, could you comment on what you think 😄
@hannelita are we talking about the GCP implementation of the streaming API or the built-in local storage one?
Specifically about the code that doesn't work (I see it is GCP code) - what is expected to be returned there is the number of uploaded bytes or otherwise put, the size of the blob after all data has been uploaded. For this, you can just do:
return blob.size
I am not a GCP expert but I think this will work, or perhaps you may need to call blob.reload()
to get fresh data from the server.
If this is not what you meant, please clarify. Specifically "What's a good way to abstract over different providers and wsgi servers? Should we revisit the api?" is not clear to me - I find this API to be just the right level of abstract (get a file object, return the number of bytes uploaded as an int).
If the problem is that gunicorn / heroku doesn't give you the size in bytes of data_stream
, we should look into that - just like we can solve the GCP case by asking GCP how many bytes did we just upload, we can probably find ways to ensure this works on Heroku / Gunicorn with other backends too. If that is still the case, we can decide to change the API or decide that we don't want to support Heroku.
As a side note, to the best of my understanding (again not a GCP / Heroku expect) you should not be using blob.upload_from_string(data_stream.read())
but blob.upload_from_file(data_stream)
so that we don't have to read the entire file to memory before uploading.
Also, we should not be using download_as_string()
in get
, and the # type: ignore
comment there is a good indication that this returns an unexpected type.
get
methods of storage backends are expected to return an Iterable[bytes]
(which could be anything that when iterated over returns bytes, so a list of bytes, an open file in binary mode, a stream, etc.).