python-bigquery-pandas
python-bigquery-pandas copied to clipboard
Resetting the Context does not work as expected
First of all thank you for this great library and all your work on various tools that help tremendously. 🙇
I would like to report something that I do not fully understand if it is a bug - but at least it is not what you would expect from reading the code, tests and documentation.
Context
I have 2 google service accounts - one is secret_sa that has access to some secret tables that not everyone has access to by default, and the other one is the default_sa that has access to all the standard datasets and tables in our team.
The use-case is following:
- first i would like to
read_gbqfrom the "secret" project/location via thesecret_sa - then i would like to upload a pandas df into a default project/location by using
default_sa
Reading through the documenation and some of the past discussions on issues - I understand that pandas_gbq sets a global context object which is then reused in order to improve performance. This means in my case after the 1st read_gbq the global context would be set to the secret project - and I would need a way to reset this before the 2nd point.
Expected
What i expected to be able to do from reading e.g. the tests for setting and resetting the global context - is similarly to the reset trick to set pandas_gbq.context to None - and the 2nd call would then use the GOOGLE_APPLICATION_CREDENTIALS in order to pick up the default_sa.
Maybe worth mentioning is that at the first call - i have a function that loads the secret_sa, gets a query and sets/deletes both the pandas_gbq.context and pandas_gbq.Context objects.
However this does not seem to work as expected (at least in my imagination) and the 2nd call still tries to use the secret_sa context/credentials/project.
For now I found a solution - which is to pass reauth = True inside calls to {pandas_gbq | pandas}.to_gbq calls down the line in the script.
Could you please help me figure out what is the correct way to do this ? I would be happy to make this into a nice short example and contribute it somewhere in the docs - if you think that would be helpful.
Many thanks in advance for all your work ! Please let me know if i can provide some more info or examples.
Environment details
- OS type and version: (i believe it's not that important here)
- Python version:
Python 3.10.6 - pip version:
pip 22.2.1 from /usr/local/lib/python3.10/site-packages/pip (python 3.10) pandas-gbqversion:Version: 0.17.8
Code example
import os
import json
import pandas as pd
import pandas_gbq as pdbq
from google.cloud import bigquery
from google.oauth2 import service_account
# authenticate and read stuff from the "secret" project
cred_json_secret = json.loads(os.getenv("SECRET-PROJ-CREDS"))
credentials_secret = service_account.Credentials.from_service_account_info(cred_json_secret)
results_secret = pdbq.read_gbq("SELECT FROM SECRET PROJ", credentials = credentials_secret)
# at this point pandas_gbq has set the global context object
# to the "secret" one
# now i want to unset/delete all this context
pdbq.Context = None
pdbq.context = None
pdbq.project = None
# or also i tried
# del pdbq.Context
# del pdbq.context
# del pdbq.project
# at this point i make a small dataframe and try to upload it
# by letting pandas_gbq figure out the default credentials
# from the GOOGLE_APPLICATION_CREDENTIALS in env.var
# but this does not work
df_upload_with_default_creds = pd.DataFrame(range(3), columns = ["col_A"])
pdbq.to_gbq(
dataframe = df_upload_with_default_creds,
destination_table = 'default-dataset.table',
project_id = "default-project",
if_exists = "replace"
)
# > GenericGBQException: Reason: 403 POST https://bigquery.googleapis.com/bigquery/v2/projects/the-project/datasets/the-dataset/tables?prettyPrint=false:
# Access Denied: Dataset the-project:the-dataset: Permission bigquery.tables.create denied on dataset the-project:the-dataset (or it may not exist).
# however this works
pdbq.to_gbq(
dataframe = df_upload_with_default_creds,
destination_table = 'default-dataset.table',
project_id = "default-project",
if_exists = "replace",
# ! need this !
reauth = True
)