pyjanitor icon indicating copy to clipboard operation
pyjanitor copied to clipboard

non-equi join rewrite numba

Open samukweku opened this issue 1 year ago • 6 comments

PR Description

Please describe the changes proposed in the pull request:

  • reimplementation of the algorithm that powers the non-equi join in numba
  • more info in the docs regarding range joins, and where numba can be helpful
  • improved performance for numba for range joins and multiple non equi joins when getting first/last match

This PR improves conditional_join.

PR Checklist

Please ensure that you have done the following:

  1. [x] PR in from a fork off your branch. Do not PR from <your_username>:dev, but rather from <your_username>:<feature-branch_name>.
  1. [x] If you're not on the contributors list, add yourself to AUTHORS.md.
  1. [x] Add a line to CHANGELOG.md under the latest version header (i.e. the one that is "on deck") describing the contribution.
    • Do use some discretion here; if there are multiple PRs that are related, keep them in a single line.

Automatic checks

There will be automatic checks run on the PR. These include:

  • Building a preview of the docs on Netlify
  • Automatically linting the code
  • Making sure the code is documented
  • Making sure that all tests are passed
  • Making sure that code coverage doesn't go down.

Relevant Reviewers

Please tag maintainers to review.

  • @ericmjl

samukweku avatar Feb 27 '24 01:02 samukweku

🚀 Deployed on https://deploy-preview-1341--pyjanitor.netlify.app

ericmjl avatar Feb 27 '24 01:02 ericmjl

Codecov Report

Attention: Patch coverage is 96.44269% with 9 lines in your changes are missing coverage. Please review.

Project coverage is 95.87%. Comparing base (62c57c6) to head (d7aafb9). Report is 10 commits behind head on dev.

:exclamation: Current head d7aafb9 differs from pull request most recent head 4fca4c0

Please upload reports for the commit 4fca4c0 to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##              dev    #1341      +/-   ##
==========================================
+ Coverage   94.48%   95.87%   +1.39%     
==========================================
  Files          80       80              
  Lines        4367     4197     -170     
==========================================
- Hits         4126     4024     -102     
+ Misses        241      173      -68     

codecov[bot] avatar Feb 27 '24 03:02 codecov[bot]

@samukweku is this one ready for review? On GitHub it looks like it's still being marked as a draft.

ericmjl avatar Mar 24 '24 12:03 ericmjl

@ericmjl not yet, still got some optimisations that I am hopeful to pull off

samukweku avatar Mar 25 '24 01:03 samukweku

@ericmjl not familiar with the jitter function. can i get your help with the failing test function. thanks

samukweku avatar Apr 18 '24 00:04 samukweku

@ericmjl not familiar with the jitter function. can i get your help with the failing test function. thanks

Not a problem, @samukweku. I encountered a similar issue when testing locally so I decided to make the test a bit more lenient.

ericmjl avatar Apr 19 '24 02:04 ericmjl

@samukweku, apologies for the long turnaround time here. I think we should merge this in, as (a) the PR has been open for a long time, and (b) I've already committed to the branch. Once the CI checks are done, I'll do a merge!

ericmjl avatar May 19 '24 03:05 ericmjl

@samukweku I have merged! Thank you for taking care of pyjanitor, I really appreciate it!

ericmjl avatar Jun 02 '24 20:06 ericmjl