narwhals icon indicating copy to clipboard operation
narwhals copied to clipboard

test: allow for tests to be run without Polars / PyArrow/ pandas installed

Open MarcoGorelli opened this issue 1 year ago • 9 comments

It would be nice to be able to do say pytest --constructors=pandas and not need Polars installed

To do this, we should replace import polars with polars = pytest.importorskip('polars') only in the places where it's truly necessary to import Polars

MarcoGorelli avatar Jan 05 '25 14:01 MarcoGorelli

Yes, please. I wanted to package narwhals for Gentoo, but can't do that because the test suite needs polars, and polars simply cannot be packaged because they require a nightly Rust compiler.

Unfortunately, this isn't going to be trivial. From a quick look I've done in the morning, polars are used directly in a lot of fixtures — I guess using something akin to pytest-lazy-fixtures may be helpful.

mgorny avatar Jan 29 '25 11:01 mgorny

Hi @mgorny !

How would you be looking to run the test suite in Gentoo? If you were able to run the full test suite for say pandas and duckdb, would that be enough? If so, I think it would be quite easy for us to allow

pytest --constructors=pandas,duckdb

without needing polars installed

MarcoGorelli avatar Jan 29 '25 11:01 MarcoGorelli

Hey @mgorny, thanks for the feedback! These are very useful to understand what to prioritize! I think #1843 was close to achieving it. I will spend more time into it or feel free to pick it up.

@MarcoGorelli I think the issue is that there are some top level imports of polars in the test suite, which would still require installation

FBruzzesi avatar Jan 29 '25 11:01 FBruzzesi

I think just using polars = pytest.importorskip('polars') in the (relatively few) tests which specifically use Polars should already be pretty close, and would be a relatively small lift?

MarcoGorelli avatar Jan 29 '25 11:01 MarcoGorelli

Ideally, we'd want to skip duckdb too, since we don't have it packaged either. pandas and pyarrow are fine for us. And yes, passing --constructors= would be good.

mgorny avatar Jan 29 '25 11:01 mgorny

Cool, I made a start on this in the importorskip-polars branch: https://github.com/narwhals-dev/narwhals/tree/importorskip-polars

Commit diff: https://github.com/narwhals-dev/narwhals/commit/32b31755b98c83538b65b175b64442544a531354

Seems to work fine:

$ python -c 'import polars'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'polars'
$ pytest tests/group_by_test.py --constructors=pandas
================================================= test session starts =================================================
platform linux -- Python 3.12.6, pytest-8.3.4, pluggy-1.5.0
Using --randomly-seed=176162433
rootdir: /home/marcogorelli/polars-api-compat-dev
configfile: pyproject.toml
plugins: hypothesis-6.123.11, cov-6.0.0, env-1.1.5, randomly-3.16.0
collected 33 items                                                                                                    

tests/group_by_test.py ....................s............                                                        [100%]

============================================ 32 passed

There's almost 30 test files left where the polars import needs moving inside the test

If we could then do the same for DuckDB, and verify that pytest --constructors=pandas,pyarrow runs without needing Polars nor DuckDB installed, that should be all you need. It wouldn't hit 100% coverage of course, but I think that's OK, most tests would run, and I think enough to give confidence that packaging happened correctly

MarcoGorelli avatar Jan 29 '25 12:01 MarcoGorelli

I can spend some time on this today. Do you have some preference on how to handle parametrization?

https://github.com/narwhals-dev/narwhals/blob/f644931076307b150c1f1333c48835e4ffe9342c/tests/expr_and_series/cast_test.py#L262

I can't immediately think of a clean way of doing this. But I'll handle the more trivial cases at least.

mgorny avatar Jan 30 '25 11:01 mgorny

I think it's fine to remove the parametrisation and loop over dtype in the test there, and skip it if polars isn't installed

MarcoGorelli avatar Jan 30 '25 15:01 MarcoGorelli

I think it's fine to remove the parametrisation and loop over dtype in the test there, and skip it if polars isn't installed

Oh, that's one approach I didn't think of! I suppose I'm "parametrize" everything kind of person :-).

mgorny avatar Jan 30 '25 15:01 mgorny

I have another implementation (almost) ready which would allow to run polars, pyarrow and pandas independently of the other two dependencies. The branch is: tests/run-with-no-deps. I have mixed feelings about it, so I am happy to wait for thoughts and feedbacks

FBruzzesi avatar Aug 03 '25 15:08 FBruzzesi

I have mixed feelings about it, so I am happy to wait for thoughts and feedbacks

Something I'm noticing about a lot of the tests that have changed, is that they aren't using that much of the imported package's api

Example

For pandas, tests/dtypes_test.py references pd 10 times, using:

  • pd.Series
  • pd.to_datetime
  • pd.DataFrame
  • pd.CategoricalDtype

I wonder if we should just collect the most common one's into fixtures that look like:

from __future__ import annotations

from typing import TYPE_CHECKING

if TYPE_CHECKING:
    import pandas as pd


@pytest.fixture(scope="session")
def pd_dataframe() -> type[pd.DataFrame]:
    pytest.importorskip("pandas")
    import pandas as pd

    return pd.DataFrame

Then if you reference the fixture, the skipping doesn't need to be repeated each time in all the dependent tests

dangotbanned avatar Aug 03 '25 18:08 dangotbanned