modin
modin copied to clipboard
FEAT-#4605: Adding small query compiler
What do these changes do?
- [x] first commit message and PR title follow format outlined here
NOTE: If you edit the PR title to match this format, you need to add another commit (even if it's empty) or amend your last commit for the CI job that checks the PR title to pick up the new PR title.
- [ ] passes
flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
- [ ] passes
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
- [ ] signed commit with
git commit -s
- [ ] Resolves #4605
- [ ] tests added and passing
- [ ] module layout described at
docs/development/architecture.rst
is up-to-date
Great start on solving this problem! Is it possible to avoid so many of the test changes?
The most changes in tests are disabling few checks as it wont be supported without partitions, and as the current changes dont yet support IO like pd.read_csv(), Is there something specific that should be avoided?
is there something specific that should be avoided?
Nothing specific, I was just trying to understand context. Thanks!
@arunjose696 please rebase on main
With the introduction of the small query compiler, we need to test the interoperability between DataFrames using different query compilers. For example, performing a binary operation between a DataFrame with the small query compiler and another with the Pandas query compiler. (Note: This feature is not yet included in this PR.)
This will require modifying or adding new tests. In the current tests in the modin/modin/tests/pandas/dataframe
folder, we have the following scenarios where two DataFrames interact:
1)Derived DataFrames: In tests where the second DataFrame is created or derived from the first, egtest_join_empty, we need to refactor these tests so that the second DataFrame is created separately from the first and with MODIN_NATIVE_DATAFRAME_MODE set.
2)Lambda Functions: In tests where the other DataFrame is created within a lambda function, eg test___divmod__, we need to refactor these tests to either create the second DataFrame in the test definition itself or provide an additional wrapper for the lambda functions to ensure the DataFrame is created with a different query compilers.
3)Separate DataFrames: In tests where two separate DataFrames are used, eg test_where, we need to refactor these tests to include flipping the MODIN_NATIVE_DATAFRAME_MODE to None and Native_pandas when creating both the first and second DataFrame. This ensures that both the left and right operands are tested with different query compilers for interoperability. This flipping would also be required in cases mentioned in 1 and 2 after dataframes are separated.
Upon reviewing the modin/modin/tests/pandas/dataframe
folder, I found approximately 100 tests involving scenarios where two DataFrames interact. These tests may need refactoring or copying to a different directory and updating to specifically test interoperability.
@YarShev @anmyachev @devin-petersohn, could you please provide suggestions on how to approach testing the interoperability?