DataProfiler
DataProfiler copied to clipboard
Add Python 3.11 to GHA
@gliptak rebased onto dev so that all the branches are going into same branch prior to deployment to main
@taylorfturner consider setting dev
as default UI branch
python-snappy
has no Python 3.11 currently https://github.com/andrix/python-snappy/pull/129
possible replacement is https://github.com/milesgranger/cramjam/tree/master/cramjam-python
#1091
will rebase after #1091 merged
https://github.com/capitalone/synthetic-data/pull/346
@gliptak rebase onto dev
and I'll approve
@taylorfturner this might already be dev
based
https://github.com/capitalone/DataProfiler/pull/1091 would have to be merged first
@gliptak #1091 merged ... rebase this and we'll take a look. Thanks for the contribution! 🎉
dask
packaging changed? https://pypi.org/project/dask/#history
https://github.com/capitalone/DataProfiler/actions/runs/8282560184/job/22663633408?pr=1090
____ ERROR collecting dataprofiler/tests/validators/test_base_validators.py ____
/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/dask/dataframe/__init__.py:22: in _dask_expr_enabled
import dask_expr # noqa: F401
E ModuleNotFoundError: No module named 'dask_expr'
During handling of the above exception, another exception occurred:
dataprofiler/tests/validators/test_base_validators.py:4: in <module>
from dask import dataframe as dd
/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/dask/dataframe/__init__.py:87: in <module>
if _dask_expr_enabled():
/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/dask/dataframe/__init__.py:24: in _dask_expr_enabled
raise ValueError("Must install dask-expr to activate query planning.")
E ValueError: Must install dask-expr to activate query planning.
=============================== warnings summary ===============================
dataprofiler/tests/profilers/test_histogram_utils.py:35
/home/runner/work/DataProfiler/DataProfiler/dataprofiler/tests/profilers/test_histogram_utils.py:35: PytestCollectionWarning: cannot collect test class 'TestColumn' because it has a __init__ constructor (from: dataprofiler/tests/profilers/test_histogram_utils.py)
class TestColumn(NumericStatsMixin):
dataprofiler/tests/profilers/test_numeric_stats_mixin_profile.py:[21](https://github.com/capitalone/DataProfiler/actions/runs/8282560184/job/22663633408?pr=1090#step:6:22)
/home/runner/work/DataProfiler/DataProfiler/dataprofiler/tests/profilers/test_numeric_stats_mixin_profile.py:21: PytestCollectionWarning: cannot collect test class 'TestColumn' because it has a __init__ constructor (from: dataprofiler/tests/profilers/test_numeric_stats_mixin_profile.py)
class TestColumn(NumericStatsMixin):
dataprofiler/tests/profilers/test_numeric_stats_mixin_profile.py:[34](https://github.com/capitalone/DataProfiler/actions/runs/8282560184/job/22663633408?pr=1090#step:6:35)
/home/runner/work/DataProfiler/DataProfiler/dataprofiler/tests/profilers/test_numeric_stats_mixin_profile.py:34: PytestCollectionWarning: cannot collect test class 'TestColumnWProps' because it has a __init__ constructor (from: dataprofiler/tests/profilers/test_numeric_stats_mixin_profile.py)
class TestColumnWProps(TestColumn):
@gliptak yeah I just started seeing this yesterday due to the package change by dask on the 12th. Haven't had the bandwidth to research why. I'd imagine a simple tag to not allow for this version would be a temporary fix to unblock
seeing on #1115 too from @carlsonp
https://github.com/capitalone/DataProfiler/actions/runs/8282934358/job/22664948501
2024-03-14T15:03:11.8884631Z WARNING: dask 2024.3.0 does not provide the extra 'dask-expr'
https://github.com/dask/dask-expr/issues/968 https://github.com/dask/dask/issues/10917 https://docs.dask.org/en/stable/changelog.html#v2024-3-0
@taylorfturner corrected dask
modules install
now there is a Keras(?) error https://github.com/capitalone/DataProfiler/actions/runs/8283138450/job/22665615645?pr=1090
@taylorfturner corrected
dask
modules installnow there is a Keras(?) error https://github.com/capitalone/DataProfiler/actions/runs/8283138450/job/22665615645?pr=1090
I'll have to take a look at this later -- its failing on the 3.11 check which makes me think there is something specific to that version of python and the dependencies / library that it doesn't like. I've seen a couple things from back in January about TF and 3.11 incompatibility ... though it does look like 3.11 is supported by keras here
@taylorfturner please guide on build errors (present for all Python versions)
https://github.com/capitalone/DataProfiler/actions/runs/8331331583/job/22797982039?pr=1090
dataprofiler/tests/profilers/test_profile_builder.py ..............F.... [ 82%]
...........................F.F.................F........................ [ 88%]
........F.............................FF [ 91%]
@taylorfturner please guide on build errors (present for all Python versions)
https://github.com/capitalone/DataProfiler/actions/runs/8331331583/job/22797982039?pr=1090
dataprofiler/tests/profilers/test_profile_builder.py ..............F.... [ 82%] ...........................F.F.................F........................ [ 88%] ........F.............................FF [ 91%]
will do @gliptak -- have a conference this week so I will do my best to get to it, but it might be more like early next week before I can attend to this. Thanks!
Actually seeing similar errors on #1119; so, this doesn't appear to be a 3.11 issue specifically @gliptak. @abajpai15, is taking a look today at this issue
Some of these errors might get fixed with the upgrade to keras 3.0 in #1138
will rebase after #1138 merged
@gliptak #1138 is merged into dev
.
Want to rebase and see if this works now? Thanks!
definitely will need a rebase @gliptak
https://github.com/dask/dask/issues/11038
The advised solution is to upgrade to Dask version 2024.4.1.
@taylorfturner am I to proceed with Dask bump as per above?
https://github.com/capitalone/DataProfiler/actions/runs/9421219740/job/25954837467?pr=1090
=========================== short test summary info ============================
ERROR dataprofiler/tests/validators/test_base_validators.py - TypeError: descriptor '__call__' for 'type' objects doesn't apply to a 'property' object
!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!
https://github.com/capitalone/DataProfiler/actions/runs/9438503984/job/25995558872?pr=1090
___________________ TestEvaluateAccuracy.test_save_conf_mat ____________________
self = <dataprofiler.tests.labelers.test_labeler_utils.TestEvaluateAccuracy testMethod=test_save_conf_mat>
mock_dataframe = <MagicMock name='DataFrame' id='140514062390544'>
mock_report = <MagicMock name='classification_report' id='[140](https://github.com/capitalone/DataProfiler/actions/runs/9438503984/job/25995558872?pr=1090#step:6:141)514060067088'>
@mock.patch("dataprofiler.labelers.labeler_utils.classification_report")
@mock.patch("pandas.DataFrame")
def test_save_conf_mat(self, mock_dataframe, mock_report):
# ideally mock out the actual contents written to file, but
# would be difficult to get this completely worked out.
expected_conf_mat = np.array(
[
[1, 0, 1],
[1, 0, 0],
[0, 1, 2],
]
)
expected_row_col_names = dict(
columns=["pred:PAD", "pred:UNKNOWN", "pred:OTHER"],
index=["true:PAD", "true:UNKNOWN", "true:OTHER"],
)
> mock_instance_df = mock.Mock(spec=pd.DataFrame)()
dataprofiler/tests/labelers/test_labeler_utils.py:255:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/unittest/mock.py:1106: in __init__
_safe_super(CallableMixin, self).__init__(
/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/unittest/mock.py:457: in __init__
self._mock_add_spec(spec, spec_set, _spec_as_instance, _eat_self)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <[AttributeError('_mock_methods') raised in repr()] Mock object at 0x7fcc6aa22410>
spec = <MagicMock name='DataFrame' id='140514062390544'>, spec_set = None
_spec_as_instance = False, _eat_self = False
def _mock_add_spec(self, spec, spec_set, _spec_as_instance=False,
_eat_self=False):
if _is_instance_mock(spec):
> raise InvalidSpecError(f'Cannot spec a Mock object. [object={spec!r}]')
E unittest.mock.InvalidSpecError: Cannot spec a Mock object. [object=<MagicMock name='DataFrame' id='140514062390544'>]
/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/unittest/mock.py:508: InvalidSpecError
this is the above outstanding test fail https://github.com/python/cpython/issues/87644
https://github.com/capitalone/DataProfiler/blob/a4486940ce556a42bb804d188d2015e047dfc3c1/dataprofiler/tests/labelers/test_labeler_utils.py#L255
this is the above outstanding test fail python/cpython#87644
https://github.com/capitalone/DataProfiler/blob/a4486940ce556a42bb804d188d2015e047dfc3c1/dataprofiler/tests/labelers/test_labeler_utils.py#L255
I see -- you are welcome to propose a fix for this as part of this PR (instead of a separate PR). If you get something operational, we can include this in the 0.12.0
release; otherwise, I will need to deploy without. Thanks, @gliptak!
@taylorfturner I rewrote the test and it ran green locally. please review
also let me know if separate bump PRs would work better (and guide on how you would like to split)
@taylorfturner I rewrote the test and it ran green locally. please review
also let me know if separate bump PRs would work better (and guide on how you would like to split)
Thanks, @gliptak !
- keep this PR but only do 3.11 add and the test update
- remove the cramjam / snappy(?)
python-snappy>=0.7.1 bump is required for Python 3.11