google-cloud-python icon indicating copy to clipboard operation
google-cloud-python copied to clipboard

Multiple stream example needs clarify

Open baggio002 opened this issue 4 years ago • 4 comments

When use 1 stream following this example https://github.com/googleapis/python-bigquery-storage/blob/6254bf2a588e69e2175df1c67edb514655d93e9d/samples/to_dataframe/main_test.py#L130-L139, the example perhaps make confusion.

The example states that we can read from multiple streams in order to get data faster. It does not say that multiple streams may be generated automatically if there is a lot of data and so you need to get data from all streams.

Thanks!

baggio002 avatar Jan 19 '21 20:01 baggio002

Thanks for the report. Yes, we can improve these comments.

Note: I did recently change this sample to request a maximum of 1 stream so that the sample does not miss rows that have been assigned to other streams. https://github.com/googleapis/python-bigquery-storage/pull/114

tswast avatar Jan 22 '21 23:01 tswast

Converting this to a feature request for a code sample handling multiple streams, as the original issue has been fixed as per Tim. Thanks!

meredithslota avatar Jun 13 '23 16:06 meredithslota

@tswast So basically, we can loop over every stream which will contain a piece of the dataframe? Are these pieces disjoint? In order to exploit multiple streams, do I need to use libraries like multiprocessing? thanks

francescomandruvs avatar May 16 '24 09:05 francescomandruvs

This issue was transferred from python-bigquery-storage to google-cloud-python as part of the work for https://github.com/googleapis/google-cloud-python/issues/10991.

parthea avatar Aug 22 '25 11:08 parthea