DataProfiler icon indicating copy to clipboard operation
DataProfiler copied to clipboard

Add Python 3.11 to GHA

Open gliptak opened this issue 1 year ago • 17 comments

gliptak avatar Feb 02 '24 18:02 gliptak

@gliptak rebased onto dev so that all the branches are going into same branch prior to deployment to main

taylorfturner avatar Feb 02 '24 19:02 taylorfturner

@taylorfturner consider setting dev as default UI branch

python-snappy has no Python 3.11 currently https://github.com/andrix/python-snappy/pull/129

possible replacement is https://github.com/milesgranger/cramjam/tree/master/cramjam-python

gliptak avatar Feb 02 '24 19:02 gliptak

#1091

gliptak avatar Feb 02 '24 20:02 gliptak

will rebase after #1091 merged

gliptak avatar Feb 02 '24 21:02 gliptak

https://github.com/capitalone/synthetic-data/pull/346

gliptak avatar Feb 26 '24 15:02 gliptak

@gliptak rebase onto dev and I'll approve

taylorfturner avatar Mar 07 '24 12:03 taylorfturner

@taylorfturner this might already be dev based

https://github.com/capitalone/DataProfiler/pull/1091 would have to be merged first

gliptak avatar Mar 07 '24 18:03 gliptak

@gliptak #1091 merged ... rebase this and we'll take a look. Thanks for the contribution! 🎉

taylorfturner avatar Mar 14 '24 13:03 taylorfturner

dask packaging changed? https://pypi.org/project/dask/#history

https://github.com/capitalone/DataProfiler/actions/runs/8282560184/job/22663633408?pr=1090

____ ERROR collecting dataprofiler/tests/validators/test_base_validators.py ____
/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/dask/dataframe/__init__.py:22: in _dask_expr_enabled
    import dask_expr  # noqa: F401
E   ModuleNotFoundError: No module named 'dask_expr'

During handling of the above exception, another exception occurred:
dataprofiler/tests/validators/test_base_validators.py:4: in <module>
    from dask import dataframe as dd
/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/dask/dataframe/__init__.py:87: in <module>
    if _dask_expr_enabled():
/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/dask/dataframe/__init__.py:24: in _dask_expr_enabled
    raise ValueError("Must install dask-expr to activate query planning.")
E   ValueError: Must install dask-expr to activate query planning.
=============================== warnings summary ===============================
dataprofiler/tests/profilers/test_histogram_utils.py:35
  /home/runner/work/DataProfiler/DataProfiler/dataprofiler/tests/profilers/test_histogram_utils.py:35: PytestCollectionWarning: cannot collect test class 'TestColumn' because it has a __init__ constructor (from: dataprofiler/tests/profilers/test_histogram_utils.py)
    class TestColumn(NumericStatsMixin):

dataprofiler/tests/profilers/test_numeric_stats_mixin_profile.py:[21](https://github.com/capitalone/DataProfiler/actions/runs/8282560184/job/22663633408?pr=1090#step:6:22)
  /home/runner/work/DataProfiler/DataProfiler/dataprofiler/tests/profilers/test_numeric_stats_mixin_profile.py:21: PytestCollectionWarning: cannot collect test class 'TestColumn' because it has a __init__ constructor (from: dataprofiler/tests/profilers/test_numeric_stats_mixin_profile.py)
    class TestColumn(NumericStatsMixin):

dataprofiler/tests/profilers/test_numeric_stats_mixin_profile.py:[34](https://github.com/capitalone/DataProfiler/actions/runs/8282560184/job/22663633408?pr=1090#step:6:35)
  /home/runner/work/DataProfiler/DataProfiler/dataprofiler/tests/profilers/test_numeric_stats_mixin_profile.py:34: PytestCollectionWarning: cannot collect test class 'TestColumnWProps' because it has a __init__ constructor (from: dataprofiler/tests/profilers/test_numeric_stats_mixin_profile.py)
    class TestColumnWProps(TestColumn):

gliptak avatar Mar 14 '24 14:03 gliptak

@gliptak yeah I just started seeing this yesterday due to the package change by dask on the 12th. Haven't had the bandwidth to research why. I'd imagine a simple tag to not allow for this version would be a temporary fix to unblock

taylorfturner avatar Mar 14 '24 14:03 taylorfturner

seeing on #1115 too from @carlsonp

taylorfturner avatar Mar 14 '24 15:03 taylorfturner

https://github.com/capitalone/DataProfiler/actions/runs/8282934358/job/22664948501

2024-03-14T15:03:11.8884631Z WARNING: dask 2024.3.0 does not provide the extra 'dask-expr'

https://github.com/dask/dask-expr/issues/968 https://github.com/dask/dask/issues/10917 https://docs.dask.org/en/stable/changelog.html#v2024-3-0

gliptak avatar Mar 14 '24 15:03 gliptak

@taylorfturner corrected dask modules install

now there is a Keras(?) error https://github.com/capitalone/DataProfiler/actions/runs/8283138450/job/22665615645?pr=1090

gliptak avatar Mar 14 '24 15:03 gliptak

@taylorfturner corrected dask modules install

now there is a Keras(?) error https://github.com/capitalone/DataProfiler/actions/runs/8283138450/job/22665615645?pr=1090

I'll have to take a look at this later -- its failing on the 3.11 check which makes me think there is something specific to that version of python and the dependencies / library that it doesn't like. I've seen a couple things from back in January about TF and 3.11 incompatibility ... though it does look like 3.11 is supported by keras here

taylorfturner avatar Mar 14 '24 16:03 taylorfturner

@taylorfturner please guide on build errors (present for all Python versions)

https://github.com/capitalone/DataProfiler/actions/runs/8331331583/job/22797982039?pr=1090

dataprofiler/tests/profilers/test_profile_builder.py ..............F.... [ 82%]
...........................F.F.................F........................ [ 88%]
........F.............................FF                                 [ 91%]

gliptak avatar Mar 18 '24 18:03 gliptak

@taylorfturner please guide on build errors (present for all Python versions)

https://github.com/capitalone/DataProfiler/actions/runs/8331331583/job/22797982039?pr=1090

dataprofiler/tests/profilers/test_profile_builder.py ..............F.... [ 82%]
...........................F.F.................F........................ [ 88%]
........F.............................FF                                 [ 91%]

will do @gliptak -- have a conference this week so I will do my best to get to it, but it might be more like early next week before I can attend to this. Thanks!

taylorfturner avatar Mar 18 '24 18:03 taylorfturner

Actually seeing similar errors on #1119; so, this doesn't appear to be a 3.11 issue specifically @gliptak. @abajpai15, is taking a look today at this issue

taylorfturner avatar Mar 22 '24 17:03 taylorfturner

Some of these errors might get fixed with the upgrade to keras 3.0 in #1138

JGSweets avatar May 13 '24 15:05 JGSweets

will rebase after #1138 merged

gliptak avatar May 14 '24 15:05 gliptak

@gliptak #1138 is merged into dev.

Want to rebase and see if this works now? Thanks!

taylorfturner avatar Jun 07 '24 14:06 taylorfturner

definitely will need a rebase @gliptak

taylorfturner avatar Jun 07 '24 14:06 taylorfturner

https://github.com/dask/dask/issues/11038

The advised solution is to upgrade to Dask version 2024.4.1.

@taylorfturner am I to proceed with Dask bump as per above?

https://github.com/capitalone/DataProfiler/actions/runs/9421219740/job/25954837467?pr=1090

=========================== short test summary info ============================
ERROR dataprofiler/tests/validators/test_base_validators.py - TypeError: descriptor '__call__' for 'type' objects doesn't apply to a 'property' object
!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!

gliptak avatar Jun 07 '24 20:06 gliptak

https://github.com/capitalone/DataProfiler/actions/runs/9438503984/job/25995558872?pr=1090

___________________ TestEvaluateAccuracy.test_save_conf_mat ____________________
self = <dataprofiler.tests.labelers.test_labeler_utils.TestEvaluateAccuracy testMethod=test_save_conf_mat>
mock_dataframe = <MagicMock name='DataFrame' id='140514062390544'>
mock_report = <MagicMock name='classification_report' id='[140](https://github.com/capitalone/DataProfiler/actions/runs/9438503984/job/25995558872?pr=1090#step:6:141)514060067088'>
    @mock.patch("dataprofiler.labelers.labeler_utils.classification_report")
    @mock.patch("pandas.DataFrame")
    def test_save_conf_mat(self, mock_dataframe, mock_report):
    
        # ideally mock out the actual contents written to file, but
        # would be difficult to get this completely worked out.
        expected_conf_mat = np.array(
            [
                [1, 0, 1],
                [1, 0, 0],
                [0, 1, 2],
            ]
        )
        expected_row_col_names = dict(
            columns=["pred:PAD", "pred:UNKNOWN", "pred:OTHER"],
            index=["true:PAD", "true:UNKNOWN", "true:OTHER"],
        )
>       mock_instance_df = mock.Mock(spec=pd.DataFrame)()
dataprofiler/tests/labelers/test_labeler_utils.py:255: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/unittest/mock.py:1106: in __init__
    _safe_super(CallableMixin, self).__init__(
/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/unittest/mock.py:457: in __init__
    self._mock_add_spec(spec, spec_set, _spec_as_instance, _eat_self)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
self = <[AttributeError('_mock_methods') raised in repr()] Mock object at 0x7fcc6aa22410>
spec = <MagicMock name='DataFrame' id='140514062390544'>, spec_set = None
_spec_as_instance = False, _eat_self = False
    def _mock_add_spec(self, spec, spec_set, _spec_as_instance=False,
                       _eat_self=False):
        if _is_instance_mock(spec):
>           raise InvalidSpecError(f'Cannot spec a Mock object. [object={spec!r}]')
E           unittest.mock.InvalidSpecError: Cannot spec a Mock object. [object=<MagicMock name='DataFrame' id='140514062390544'>]
/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/unittest/mock.py:508: InvalidSpecError

gliptak avatar Jun 09 '24 19:06 gliptak

this is the above outstanding test fail https://github.com/python/cpython/issues/87644

https://github.com/capitalone/DataProfiler/blob/a4486940ce556a42bb804d188d2015e047dfc3c1/dataprofiler/tests/labelers/test_labeler_utils.py#L255

gliptak avatar Jun 10 '24 16:06 gliptak

this is the above outstanding test fail python/cpython#87644

https://github.com/capitalone/DataProfiler/blob/a4486940ce556a42bb804d188d2015e047dfc3c1/dataprofiler/tests/labelers/test_labeler_utils.py#L255

I see -- you are welcome to propose a fix for this as part of this PR (instead of a separate PR). If you get something operational, we can include this in the 0.12.0 release; otherwise, I will need to deploy without. Thanks, @gliptak!

taylorfturner avatar Jun 10 '24 20:06 taylorfturner

@taylorfturner I rewrote the test and it ran green locally. please review

also let me know if separate bump PRs would work better (and guide on how you would like to split)

gliptak avatar Jun 10 '24 21:06 gliptak

@taylorfturner I rewrote the test and it ran green locally. please review

also let me know if separate bump PRs would work better (and guide on how you would like to split)

Thanks, @gliptak !

  • keep this PR but only do 3.11 add and the test update
  • remove the cramjam / snappy(?)

taylorfturner avatar Jun 11 '24 12:06 taylorfturner

python-snappy>=0.7.1 bump is required for Python 3.11

gliptak avatar Jun 11 '24 13:06 gliptak