python-bigquery-pandas
python-bigquery-pandas copied to clipboard
read_gbq results in lingering system thread after function call
Thanks for stopping by to let us know something could be better!
PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.
Please run down the following list and make sure you've tried the usual "quick fixes":
- Search the issues already opened: https://github.com/googleapis/python-bigquery-pandas/issues
- Search StackOverflow: https://stackoverflow.com/questions/tagged/google-cloud-platform+python
If you are still having issues, please be sure to include as much information as possible:
Environment details
- OS type and version: Linux CentOS 7
- Python version:
3.10 pandas-gbqversion:0.19.0
Steps to reproduce
- Start debugger session
- Run read_gbq function
- Look at Call Stack after function execution and notice extra running system thread.
Code example
ret_df = pd.read_gbq(
query_str,
project_id=bq_wrap.bq_billing_project, #Billing project
configuration={'query':{'defaultDataset':{"datasetId": profile.bq_dataset, "projectId": bq_wrap.bq_project}}},
credentials=creds,
use_bqstorage_api=use_bqstorage_api,
progress_bar_type='tqdm')
Stack trace

Having the same issue! Its also leading to memory leaks for me. These extra threads seem to be holding references to data that read_gbq returns - preventing the garbage collector from removing it.
Thanks for reporting the issue! I am able to reproduce it, but it seems to only happen when tqdm is used.
It seems tqdm opens a new thread when an tqdm object is created, but it's not closed when tqdm is closed.
import tqdm
# Create a list of numbers
numbers = list(range(3))
# Create a progress bar
pbar = tqdm.tqdm(numbers)
# Iterate over the list of numbers
for number in pbar:
# Do something with the number
pass
# Close the progress bar
pbar.close()
# There is a "Thread-7" at this breakpoint
breakpoint()
exit(0)
This is caused by tqdm creating a new thread with class TMonitor while creating a new tqdm.tqdm object. A way to patch this is to set tqdm.tqdm.monitor_interval = 0 before using it - for example just after the library is imported. But overall I think it's a bug with tqdm. I opened an issue at tqdm, so I will close this one. Still, please leave a comment or open a new issue if you have any questions :)