Tim Sweña (Swast)

Results 302 comments of Tim Sweña (Swast)

Re: `FAILED tests/system/small/test_pandas.py::test_get_dummies_dataframe[kwargs2] - AssertionError: DataFrame.iloc[:, 11] (column name="time_col.11:14:34.701606") are different` Looks like the time scalar is losing microsecond precision. ``` COALESCE(`t0`.`time_col` = time(11, 14, 34), FALSE) AS `col_15`, COALESCE(`t0`.`time_col`...

Marking as `do not merge` because to fix 3.9, we'll need to vendor ibis 9.x into third_party.

Sweet. I had forgotten https://github.com/googleapis/python-bigquery/blob/c9068e4191dbe3632fe399a0b777e8bc54a183a6/google/cloud/bigquery/dbapi/_helpers.py#L468-L470 Thanks for investigating! It does seem like we should change this in the BQ Storage client, but this will be good to keep in mind...

This has to be done via the `page_size` parameter on [`QueryJob.result`](https://cloud.google.com/python/docs/reference/bigquery/latest/google.cloud.bigquery.job.QueryJob#google_cloud_bigquery_job_QueryJob_result) or [`query_and_wait`](https://cloud.google.com/python/docs/reference/bigquery/latest/google.cloud.bigquery.client.Client#google_cloud_bigquery_client_Client_query_and_wait). As far as I can tell we're doing this correctly in Ibis. https://github.com/ibis-project/ibis/blob/9f565a9ad98a089fcb25959a88136a6e7bc1c506/ibis/backends/bigquery/__init__.py#L802 https://github.com/ibis-project/ibis/blob/9f565a9ad98a089fcb25959a88136a6e7bc1c506/ibis/backends/bigquery/__init__.py#L680 A few things...

Would it make sense for Ibis to do some client-size grouping to respect this parameter? I've viewed page size / chunk size as more of a tuning parameter, so it...

Source for the fact that we only need to worry about individual message -> arrow/dataframe: https://github.com/googleapis/python-bigquery/blob/1246da86b78b03ca1aa2c45ec71649e294cfb2f1/google/cloud/bigquery/_pandas_helpers.py#L598 https://github.com/googleapis/python-bigquery/blob/f8d4aaa335a0eef915e73596fc9b43b11d11be9f/google/cloud/bigquery/_pandas_helpers.py#L752 https://github.com/googleapis/python-bigquery/blob/f8d4aaa335a0eef915e73596fc9b43b11d11be9f/google/cloud/bigquery/_pandas_helpers.py#L581

Turns out there are some use cases for ReadRowsStream.to_arrow: https://github.com/vaexio/vaex/blob/f1335d20a6f0a52259c368fc0a7fef3cd4919f8f/packages/vaex-contrib/vaex/contrib/io/gbq.py#L112

Thanks for the report. Yes, we can improve these comments. Note: I did recently change this sample to request a maximum of 1 stream so that the sample does not...