bigquery-emulator
bigquery-emulator copied to clipboard
Loading a CSV from emulated GCS fails
Bug Report: Running a load job from an emulated GCS using the java client errors
Description
I am using fake-gcs
Steps to Reproduce
- clone this repo https://github.com/mcgizzle/bq-emulator-repro
- follow instructions in the README
Expected Behavior
CSV data is loaded into BQ
Actual Behavior
Error is raised:
Exception in thread "main" com.google.cloud.bigquery.BigQueryException: failed to import from gcs: failed to get gcs object reader for bucket/object.csv: storage: object doesn't exist
I can see in the GCS logs that it seems to be making a bad call
fake-gcs_1 | time="2023-07-19T08:41:47Z" level=info msg="172.20.0.2 - - [19/Jul/2023:08:41:47 +0000] \"GET /bucket/object.csv HTTP/1.1\" 404 10"
Note it is missing /v1/storage
I can see the object does indeed exist
λ curl http://localhost:4443/storage/v1/b/bucket/o/object.csv | jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 471 100 471 0 0 62758 0 --:--:-- --:--:-- --:--:-- 229k
{
"kind": "storage#object",
"name": "object.csv",
"id": "bucket/object.csv",
"bucket": "bucket",
"size": "12",
"contentType": "text/csv; charset=utf-8",
"crc32c": "2UejRA==",
"acl": [
{
"bucket": "bucket",
"entity": "projectOwner-test-project",
"object": "object.csv",
"projectTeam": {},
"role": "OWNER"
}
],
"md5Hash": "t1iCl7bqaXR343oqnSH+eg==",
"etag": "\"t1iCl7bqaXR343oqnSH+eg==\"",
"timeCreated": "2023-07-19T08:41:46.824183Z",
"updated": "2023-07-19T08:41:46.824217Z",
"generation": "1689756106824243"
}
Environment Details
- Operating System: MacOS
- Java version: 11
- Docker version: 4.14.1 (91661)
Minimal Reproducible Example
See above
Thank you for this tool BTW 🙏
Is there any update on this issue? I'm trying to use both fake-gcs-server and bigquery-emulator to mock their respective Google Operator calls on Airflow for local development, and I'm getting this same issue. I also tried to change the STORAGE_EMULATOR_HOST to both localhost:4443/storage/v1/
and localhost:4443/download/v1/
and had the same result @mcgizzle has reported
You might possibly avoid this issue by setting a publicHost for the emulator.
for example, I could confirmed that changing the settings as following avoids the error . https://github.com/mcgizzle/bq-emulator-repro/compare/master...totem3:bq-emulator-repro:master?expand=1
❯ docker compose up
[+] Building 0.0s (0/0) docker:desktop-linux
[+] Running 3/0
✔ Container bq-emulator-repro-fake-gcs-1 Created 0.0s
✔ Container bq-emulator-repro-fake-bq-1 Created 0.0s
✔ Container bq-emulator-repro-app-1 Created 0.0s
Attaching to bq-emulator-repro-app-1, bq-emulator-repro-fake-bq-1, bq-emulator-repro-fake-gcs-1
bq-emulator-repro-fake-gcs-1 | time=2023-11-23T14:36:38.066Z level=INFO msg="server started at http://0.0.0.0:4443"
bq-emulator-repro-fake-bq-1 | [bigquery-emulator] REST server listening at 0.0.0.0:9050
bq-emulator-repro-fake-bq-1 | [bigquery-emulator] gRPC server listening at 0.0.0.0:9060
bq-emulator-repro-fake-gcs-1 | time=2023-11-23T14:36:41.646Z level=INFO msg="192.168.112.3 - - [23/Nov/2023:14:36:41 +0000] \"GET /bucket/object.csv HTTP/1.1\" 200 0\n"
bq-emulator-repro-app-1 | Waiting for job to complete...
bq-emulator-repro-app-1 exited with code 0