python-bigquery-pandas icon indicating copy to clipboard operation
python-bigquery-pandas copied to clipboard

read_gbq results in lingering system thread after function call

Open jlynchMicron opened this issue 2 years ago • 1 comments

Thanks for stopping by to let us know something could be better!

PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.

Please run down the following list and make sure you've tried the usual "quick fixes":

  • Search the issues already opened: https://github.com/googleapis/python-bigquery-pandas/issues
  • Search StackOverflow: https://stackoverflow.com/questions/tagged/google-cloud-platform+python

If you are still having issues, please be sure to include as much information as possible:

Environment details

  • OS type and version: Linux CentOS 7
  • Python version: 3.10
  • pandas-gbq version: 0.19.0

Steps to reproduce

  1. Start debugger session
  2. Run read_gbq function
  3. Look at Call Stack after function execution and notice extra running system thread.

Code example

ret_df = pd.read_gbq(
                query_str, 
                project_id=bq_wrap.bq_billing_project, #Billing project
                configuration={'query':{'defaultDataset':{"datasetId": profile.bq_dataset, "projectId": bq_wrap.bq_project}}}, 
                credentials=creds,
                use_bqstorage_api=use_bqstorage_api,
                progress_bar_type='tqdm')

Stack trace

image

jlynchMicron avatar Feb 17 '23 00:02 jlynchMicron

Having the same issue! Its also leading to memory leaks for me. These extra threads seem to be holding references to data that read_gbq returns - preventing the garbage collector from removing it.

jkelly80 avatar Jun 26 '23 22:06 jkelly80

Thanks for reporting the issue! I am able to reproduce it, but it seems to only happen when tqdm is used.

Linchin avatar Apr 09 '24 20:04 Linchin

It seems tqdm opens a new thread when an tqdm object is created, but it's not closed when tqdm is closed.

import tqdm

# Create a list of numbers
numbers = list(range(3))

# Create a progress bar
pbar = tqdm.tqdm(numbers)

# Iterate over the list of numbers
for number in pbar:
    # Do something with the number
    pass

# Close the progress bar
pbar.close()

# There is a "Thread-7" at this breakpoint
breakpoint()

exit(0)

Linchin avatar Apr 09 '24 23:04 Linchin

This is caused by tqdm creating a new thread with class TMonitor while creating a new tqdm.tqdm object. A way to patch this is to set tqdm.tqdm.monitor_interval = 0 before using it - for example just after the library is imported. But overall I think it's a bug with tqdm. I opened an issue at tqdm, so I will close this one. Still, please leave a comment or open a new issue if you have any questions :)

Linchin avatar Apr 10 '24 21:04 Linchin