bigquery-emulator icon indicating copy to clipboard operation
bigquery-emulator copied to clipboard

Use Emulator with PySpark

Open lvijnck opened this issue 1 year ago • 1 comments

What happened?

Hi all,

I'm trying to read from the emulator using PySpark (no Scala), however, I can't seem to figure out how to setup the anonymous credentials.

Any ideas?

Reading the dataframe as follows:

    # Load dataset
    return session.read.format("bigquery") \
        .option("parentProject", "test") \
        .option("table", "test.test") \
        .option("proxyAddress", "0.0.0.0:9060") \
        .load().show()

This gives the following error:

POST https://oauth2.googleapis.com/token
{
  "error": "invalid_grant",
  "error_description": "Bad Request"
}

lvijnck avatar Jan 29 '24 14:01 lvijnck

Hi there

I am not familiar with pyspark or spark-bigquery-connector, but I understand that the bigquery-emulator does not request permissions or provide authentication features. Therefore, it seems unlikely that this issue is related to the bigquery-emulator but rather a problem on the client side. From what I can see in the spark-bigquery-connector's README and the error messages, it appears that the spark-bigquery-connector requires some form of valid access token. When using the Java SDK without authentication, I supporse NoCredentials is typically used. However, from the look of the configuration interface, it doesn't seem possible to use that here.

Additionally, it is another issue though, you seem to have set the proxyAddress. According to the README and the following PR, the proxy is intended for connecting to BigQuery through a forward proxy like squid. Therefore, it seems incorrect to specify the address of the bigquery-emulator there. (I haven’t used it myself, so I might not be completely accurate.)

If you were to configure it, perhaps you should look at bigQueryHttpEndpoint or bigQueryStorageGrpcEndpoint.

totem3 avatar Jan 30 '24 01:01 totem3