skrub
skrub copied to clipboard
Prepping tables for machine learning
Adds CodeQL, a tool for finding vulnerabilities and mistakes in the code. It was recommended by GitHub, let's see if this can be useful for us.
Fixes #246 Also refactored some tests for the `SuperVectorizer` to make the code cleaner.
Resolves #226.
When passing an already "clean" dataset to the `SuperVectorizer` (which is common in generic pipelines), it raises a `RuntimeError` as `No transformers could be generated!`. The doc mentions ``` Raises...
This PR aims at improving the overall quality of the code and doc. It has several purposes: - Correct typos - Reword unclear sentences - Minor updates to the doc...
I noticed that a test of the `GapEncoder` does not use the `init1` parameter ```py @pytest.mark.parametrize("init1, analyzer1, analyzer2",[ ('k-means++', 'char', 'word'), ('random', 'char', 'word'), ('k-means', 'char', 'word') ]) def test_analyzer(init1,...
This PR adds the new `FuzzyJoin` class that allows joining tables with dirty columns. Co-authored-by: @LeoGrin
In the light of what we talked on improving user experience: We can add example 3 as being the last part of example 2, rather than a separate example. Looking...