fake-gcs-server icon indicating copy to clipboard operation
fake-gcs-server copied to clipboard

404 when doing a resumable upload POST

Open BigJerBD opened this issue 3 years ago • 9 comments

Image version : latest (v1.30.2)

I'm using apache beam to do resumable uploads into a fake gcs bucket (for testing purpose) , but I get this error

"GET  /storage/v1/b/data?alt=json HTTP/1.1\" 200 112" 
"POST /resumable/upload/storage/v1/b/data/o?alt=json&name=aac%2Ftest1%2Fbeam-temp-data-820e4a4e464311ecac030242ac150002%2F18266b8a-3b30-4bf3-bda5-af203113e46d.data.csv&uploadType=resumable HTTP/1.1\" 404 59"

I also confirmed that the path test1 was present :

"GET /storage/v1/b/data/o?maxResults=1&projection=noAcl&prefix=aac%2Ftest1%2F2021103019551635623713%2F&delimiter=%2F&prettyPrint=false HTTP/1.1\" 200 533"

It work with the real GCS service so I was wondering if the sent POST has any version compatibility error or if it isnt supported yet anyhow.

Thanks !

BigJerBD avatar Nov 15 '21 19:11 BigJerBD

@BigJerBD hey, would you be able to share a snippet on how to reproduce the issue? I can definitely look into this some time this weekend or early next week.

fsouza avatar Apr 22 '22 12:04 fsouza

Hi @BigJerBD , I'm also trying to use Apache beam Filesystems to upload and download (Using Filesystems). But I keep getting error: HttpError accessing <https://www.googleapis.com/resumable/upload/storage/v1/b/. It seems that it keeps accessing www.googleapis.com using Apache Beam, no matter how I set the environment variable. Could you please share a snippet how you do this? Thanks a lot!

wwwjn avatar Apr 22 '22 17:04 wwwjn

I'll try this weekend to share a snippet the error that I had .

It's been a while so I probably lost it and have to reproduce it again :sweat_smile:

BigJerBD avatar Apr 22 '22 20:04 BigJerBD

Hi, I monkey-patch Apache Beam to replace www.googleapis.com with fake-gcs-server, then I got the same error with @BigJerBD (I got 404 !) And my script is: test.py (Apache beam version : apache-beam==2.36.0)

def test_GCS():
    URL = "gs://sample-bucket/test.gz"

    # write to test buckets
    with FileSystems.create(URL, compression_type=CompressionTypes.UNCOMPRESSED) as f:
        f.write(gzip.compress(b"hello world"))

if __name__ == "__main__":
    from .gcsio import *
    test_GCS()

And the gcsio.py file is (which is used for monkey-patch Apache Beam):

# Monkey-patch init function of GcsIO
import apache_beam.io.gcp.gcsio
from apache_beam.io.gcp.internal.clients import storage
from apache_beam.internal.gcp import auth
from apache_beam.internal.http_client import get_new_http

from google.auth.credentials import AnonymousCredentials

def new_init(self, storage_client=None):
    # raise Exception("This is a test")
    if storage_client is None:
        storage_client = storage.StorageV1(
            url = "http://0.0.0.0:4443/storage/v1/",
            credentials=auth.get_service_credentials(),
            get_credentials=False,
            http=get_new_http(),
            response_encoding='utf8'
        )
    self.client = storage_client
    self._rewrite_cb = None
    self.bucket_to_project_number = {}

# Monkey Patch the GcsIO to upload
apache_beam.io.gcp.gcsio.GcsIO.__init__ = new_init

And I got following error with resumable url: image And the following info is from the fake-gcs-docker:

time="2022-04-23T00:36:40Z" level=info msg="172.17.0.1 - - [23/Apr/2022:00:36:40 +0000] \"GET /storage/v1/b/sample-bucket?alt=json HTTP/1.1\" 200 153"

time="2022-04-23T00:36:40Z" level=info msg="172.17.0.1 - - [23/Apr/2022:00:36:40 +0000] \"POST /resumable/upload/storage/v1/b/sample-bucket/o?alt=json&name=test.gz&uploadType=resumable HTTP/1.1\" 404 59"

Thanks a lot for your help and hope this will help!

wwwjn avatar Apr 23 '22 00:04 wwwjn

@wwwjn thank you very much for the snippet! This is indeed something like that I did when I was doing to use fake-gcs-server.

Apache beam or not, since this were also giving a 404, I was also wondering if this feature was implemented within fake-gcs-server or not.

Thanks ! :)

BigJerBD avatar Apr 24 '22 03:04 BigJerBD

Hi @fsouza, is there any progress on this bug? Thanks a lot for your help!

wwwjn avatar May 10 '22 17:05 wwwjn

Hi @fsouza, is there any progress on this bug? Thanks a lot for your help!

Hey, I haven't had a chance to look at it yet, but I assume the fix should be simple. I'll check it out in the coming weeks.

fsouza avatar May 11 '22 01:05 fsouza

Hi @fsouza, is there any progress on this bug? Thanks a lot for your help!

Hey, I haven't had a chance to look at it yet, but I assume the fix should be simple. I'll check it out in the coming weeks.

Thanks a lot! If there is anything I could do, feel free to just let me know!

wwwjn avatar May 11 '22 02:05 wwwjn

For anyone like me coming from Google and simply want to override the URL for Apache Beam to point to fake-gcs-server url, there's an issue tracking this here: https://github.com/apache/beam/issues/21255

For now, the solution is still to patch the url in the test. This worked for me:

from unittest import mock

@mock.patch.object(apache_beam.io.gcp.internal.clients.storage.StorageV1, "BASE_URL",
                   "http://localhost:4443/storage/v1/")
def test_gcs_source():
    pass # test implementation here should now call the emulator

where http://localhost:4443 is the url of your fake-gcs-server instance

martinbjeldbak avatar Apr 15 '24 08:04 martinbjeldbak