bigquery-emulator icon indicating copy to clipboard operation
bigquery-emulator copied to clipboard

Error on load object from gcs emulator on insert job API

Open LeoCBS opened this issue 8 months ago • 3 comments

What happened?

Emulator was trying to load object from GCS emulator using wrong URL:

log from fsouza/fake-gcs-server emulator

time=2025-04-27T11:46:19.843Z level=INFO msg="172.20.0.4 - - [27/Apr/2025:11:46:19 +0000] \"GET /storage/v1//storage/v1/b/md-handler-999/o/sample.parquet?alt=media&prettyPrint=false&projection=full HTTP/1.1\" 404 10\n

GCS storage get object API:

https://cloud.google.com/storage/docs/json_api/v1/objects/get#http-request

GET https://storage.googleapis.com/storage/v1/b/bucket/o/object

What did you expect to happen?

Load object from GCS emulator correctly when i am trying create a insert job.

How can we reproduce it (as minimally and precisely as possible)?

Set STORAGE_EMULATOR_HOST variable and try to insert a job

Anything else we need to know?

If this issue makes sense, I can submit a PR to fix it.

LeoCBS avatar Apr 27 '25 11:04 LeoCBS

I ran into the same issue and fixed in a PR a few months ago in a backwards compatible way, but the maintainer doesn't seem to be active any more unfortunately 😿

daaain avatar Apr 29 '25 15:04 daaain

Setting the fsouza/fake-gcs-server parameter -external-url=http://localhost:4443 could fix that.

Here is an example docker-compose.yaml I have used:

services:
    bigquery_emulator:
        image: ghcr.io/goccy/bigquery-emulator
        container_name: bigquery-emulator
        command: 
            - --project=test
        platform: linux/x86_64
        environment:
            - STORAGE_EMULATOR_HOST=http://cloudstorage_emulator:4443
    cloudstorage_emulator:
        image: fsouza/fake-gcs-server
        container_name: cloudstorage-emulator
        command: 
            - -scheme=http
            - -backend=memory
            - -external-url=http://cloudstorage_emulator:4443

Hope it helps, please let me know if you face any issues.

snikolakis avatar May 02 '25 11:05 snikolakis

@snikolakis @daaain nice suggestions, but according Google's documentation, we only need to set STORAGE_EMULATOR_HOST and don't need to change the client URL.

I already testing and submit and PR with this change,

https://github.com/goccy/bigquery-emulator/pull/405

https://cloud.google.com/go/docs/reference/cloud.google.com/go/storage/latest

LeoCBS avatar May 04 '25 12:05 LeoCBS