ibis-bigquery icon indicating copy to clipboard operation
ibis-bigquery copied to clipboard

[BigQuery] test_string[split] is failing

Open tswast opened this issue 5 years ago • 1 comments

Failing test

https://github.com/ibis-project/ibis/blob/master/ibis/tests/all/test_string.py#L230-L234

Test output

$ pytest ibis/tests/all/test_string.py::test_string[BigQuery-split]
======================================= test session starts ========================================
platform darwin -- Python 3.7.8, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: /Users/swast/src/ibis, inifile: setup.cfg
plugins: forked-1.2.0, mock-3.1.1, cov-2.10.0, xdist-1.34.0
collected 1 item                                                                                   

ibis/tests/all/test_string.py F                                                              [100%]

============================================= FAILURES =============================================
___________________________________ test_string[BigQuery-split] ____________________________________

backend = <ibis.tests.backends.BigQuery object at 0x7fb9598ee950>
alltypes = BigQueryTable[table]
  name: swast-scratch.testing.functional_alltypes
  schema:
    index : int64
    Unnamed_0 : int...4
    date_string_col : string
    string_col : string
    timestamp_col : timestamp
    year : int64
    month : int64
df =       index  Unnamed_0    id  bool_col  ...  string_col           timestamp_col  year  month
0       300        300  1...
7299   7296       7296  3956      True  ...           6 2010-01-31 05:06:13.650  2010      1

[7300 rows x 15 columns]
result_func = <function <lambda> at 0x7fb9598f2f80>
expected_func = <function <lambda> at 0x7fb9598f3050>

    @pytest.mark.parametrize(
        ('result_func', 'expected_func'),
        [
...
          param(
                lambda t: t.date_string_col.split('/'),
                lambda t: t.date_string_col.str.split('/'),
                id='split',
            ),
            param(
                lambda t: ibis.literal('-').join(['a', t.string_col, 'c']),
                lambda t: 'a-' + t.string_col + '-c',
                id='join',
            ),
        ],
    )
    @pytest.mark.xfail_unsupported
    def test_string(backend, alltypes, df, result_func, expected_func):
        expr = result_func(alltypes)
        result = expr.execute()
    
        expected = backend.default_series_rename(expected_func(df))
>       backend.assert_series_equal(result, expected)

ibis/tests/all/test_string.py:248: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
ibis/tests/backends.py:146: in assert_series_equal
    left = left.sort_values().reset_index(drop=True)
../../miniconda3/envs/ibis-dev/lib/python3.7/site-packages/pandas/core/series.py:3167: in sort_values
    argsorted = _try_kind_sort(arr[good])
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

arr = array([array(['06', '01', '09'], dtype=object),
       array(['06', '02', '09'], dtype=object),
       array(['06', '0...object),
       array(['01', '30', '10'], dtype=object),
       array(['01', '31', '10'], dtype=object)], dtype=object)

    def _try_kind_sort(arr):
        # easier to ask forgiveness than permission
        try:
            # if kind==mergesort, it can fail for object dtype
>           return arr.argsort(kind=kind)
E           ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

../../miniconda3/envs/ibis-dev/lib/python3.7/site-packages/pandas/core/series.py:3153: ValueError
========================================= warnings summary =========================================
ibis/tests/all/test_string.py::test_string[BigQuery-split]
  /Users/swast/src/ibis/ibis/bigquery/client.py:545: PendingDeprecationWarning: Client.dataset is deprecated and will be removed in a future version. Use a string like 'my_project.my_dataset' or a cloud.google.bigquery.DatasetReference object, instead.
    table_ref = self.client.dataset(dataset, project=project).table(name)

ibis/tests/all/test_string.py::test_string[BigQuery-split]
  /Users/swast/src/ibis/ibis/bigquery/client.py:432: PendingDeprecationWarning: Client.dataset is deprecated and will be removed in a future version. Use a string like 'my_project.my_dataset' or a cloud.google.bigquery.DatasetReference object, instead.
    dataset_ref = self.client.dataset(dataset, project=project)

-- Docs: https://docs.pytest.org/en/latest/warnings.html
===================================== short test summary info ======================================
FAILED ibis/tests/all/test_string.py::test_string[BigQuery-split] - ValueError: The truth value o...
================================== 1 failed, 2 warnings in 4.55s ===================================

tswast avatar Sep 10 '20 15:09 tswast

Found in ibis-project/ibis#2353

As with ibis-project/ibis#2370, I think this failure is probably due to using Arrow as an intermediate format in the google-cloud-bigquery library.

tswast avatar Sep 10 '20 15:09 tswast