prql icon indicating copy to clipboard operation
prql copied to clipboard

BigQuery integration tests

Open max-sixty opened this issue 1 year ago • 2 comments

It would be great to have BQ integration tests, given how great BigQuery is, and some of the issues we're facing, like the https://github.com/prql/prql/issues/852 saga

Some questions to consider:

  • I guess we need an auth token to do actual queries? I'm fine if there's some cost associated with the queries in CI; it would be great if it's easy to set up for people locally, probably using their own account. IIUC BQ has a decent amount of free quota per account.
  • What's the best way of using BQ from rust? It's very easy from python. We could have the test suite in python if needed, but it would be better to keep in a single crate.
  • Would we load the same data that we use for duckdb integration tests? Or use bigquery public data?

I used to work with @tswast on some BigQuery open-source work, and he built a lot of infra like this for the python data ecosystem. If he has any insight here that would be awesome (but no stress if you don't see this / don't have the capacity to respond, thank you!)

max-sixty avatar Jul 27 '22 18:07 max-sixty

I guess we need an auth token to do actual queries? I'm fine if there's some cost associated with the queries in CI; it would be great if it's easy to set up for people locally, probably using their own account. IIUC BQ has a decent amount of free quota per account.

Yes, you'll need a GCP account. Thankfully, there is a free tier, so you might even be able to get away with just using that without creating a billing account. https://cloud.google.com/bigquery/docs/sandbox

Not sure what service you are using for CI/CD, but you might be able to avoid creating a key file by using Workload Identity Federation. See this project for GCP auth on GitHub Actions: https://github.com/google-github-actions/auth#setup

For local auth, you should be able to use gcloud auth application-default login https://cloud.google.com/sdk/gcloud/reference/auth/application-default/login though I'm not sure on the support for this. Alternatively, you could try and replicate some of the logic from https://github.com/pydata/pydata-google-auth for working with user credentials.

What's the best way of using BQ from rust? It's very easy from python. We could have the test suite in python if needed, but it would be better to keep in a single crate.

I recommend checking out https://github.com/mozilla-services/google-cloud-rust, which Google and Mozilla built in partnership. Unfortunately, it's unlikely that an autogenerated client for BigQuery will be particularly useful for running queries on its own (if at all possible, since the core API is REST not gRPC), but there is an open issue at https://github.com/mozilla-services/google-cloud-rust/issues/25 which could be contributed to.

tswast avatar Jul 27 '22 19:07 tswast

Awesome, thanks a lot @tswast ! I'll check those out. Sounds like starting off with python integration tests may be an easier initial path for the moment.

max-sixty avatar Jul 27 '22 19:07 max-sixty