bigquery-emulator icon indicating copy to clipboard operation
bigquery-emulator copied to clipboard

Loading a CSV from emulated GCS fails

Open mcgizzle opened this issue 1 year ago • 2 comments

Bug Report: Running a load job from an emulated GCS using the java client errors

Description

I am using fake-gcs

Steps to Reproduce

  1. clone this repo https://github.com/mcgizzle/bq-emulator-repro
  2. follow instructions in the README

Expected Behavior

CSV data is loaded into BQ

Actual Behavior

Error is raised:

Exception in thread "main" com.google.cloud.bigquery.BigQueryException: failed to import from gcs: failed to get gcs object reader for bucket/object.csv: storage: object doesn't exist

I can see in the GCS logs that it seems to be making a bad call

fake-gcs_1  | time="2023-07-19T08:41:47Z" level=info msg="172.20.0.2 - - [19/Jul/2023:08:41:47 +0000] \"GET /bucket/object.csv HTTP/1.1\" 404 10"

Note it is missing /v1/storage


I can see the object does indeed exist

λ curl http://localhost:4443/storage/v1/b/bucket/o/object.csv | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   471  100   471    0     0  62758      0 --:--:-- --:--:-- --:--:--  229k
{
  "kind": "storage#object",
  "name": "object.csv",
  "id": "bucket/object.csv",
  "bucket": "bucket",
  "size": "12",
  "contentType": "text/csv; charset=utf-8",
  "crc32c": "2UejRA==",
  "acl": [
    {
      "bucket": "bucket",
      "entity": "projectOwner-test-project",
      "object": "object.csv",
      "projectTeam": {},
      "role": "OWNER"
    }
  ],
  "md5Hash": "t1iCl7bqaXR343oqnSH+eg==",
  "etag": "\"t1iCl7bqaXR343oqnSH+eg==\"",
  "timeCreated": "2023-07-19T08:41:46.824183Z",
  "updated": "2023-07-19T08:41:46.824217Z",
  "generation": "1689756106824243"
}

Environment Details

  • Operating System: MacOS
  • Java version: 11
  • Docker version: 4.14.1 (91661)

Minimal Reproducible Example

See above

Thank you for this tool BTW 🙏

mcgizzle avatar Jul 19 '23 08:07 mcgizzle

Is there any update on this issue? I'm trying to use both fake-gcs-server and bigquery-emulator to mock their respective Google Operator calls on Airflow for local development, and I'm getting this same issue. I also tried to change the STORAGE_EMULATOR_HOST to both localhost:4443/storage/v1/ and localhost:4443/download/v1/ and had the same result @mcgizzle has reported

jvaesteves avatar Nov 21 '23 15:11 jvaesteves

You might possibly avoid this issue by setting a publicHost for the emulator.

for example, I could confirmed that changing the settings as following avoids the error . https://github.com/mcgizzle/bq-emulator-repro/compare/master...totem3:bq-emulator-repro:master?expand=1

❯ docker compose up
[+] Building 0.0s (0/0)                                                                                                                                                                                                                 docker:desktop-linux
[+] Running 3/0
 ✔ Container bq-emulator-repro-fake-gcs-1  Created                                                                                                                                                                                                      0.0s
 ✔ Container bq-emulator-repro-fake-bq-1   Created                                                                                                                                                                                                      0.0s
 ✔ Container bq-emulator-repro-app-1       Created                                                                                                                                                                                                      0.0s
Attaching to bq-emulator-repro-app-1, bq-emulator-repro-fake-bq-1, bq-emulator-repro-fake-gcs-1
bq-emulator-repro-fake-gcs-1  | time=2023-11-23T14:36:38.066Z level=INFO msg="server started at http://0.0.0.0:4443"
bq-emulator-repro-fake-bq-1   | [bigquery-emulator] REST server listening at 0.0.0.0:9050
bq-emulator-repro-fake-bq-1   | [bigquery-emulator] gRPC server listening at 0.0.0.0:9060
bq-emulator-repro-fake-gcs-1  | time=2023-11-23T14:36:41.646Z level=INFO msg="192.168.112.3 - - [23/Nov/2023:14:36:41 +0000] \"GET /bucket/object.csv HTTP/1.1\" 200 0\n"
bq-emulator-repro-app-1       | Waiting for job to complete...
bq-emulator-repro-app-1 exited with code 0

totem3 avatar Nov 23 '23 14:11 totem3