skrub icon indicating copy to clipboard operation
skrub copied to clipboard

Prepping tables for machine learning

Results 296 skrub issues
Sort by recently updated
recently updated
newest added

Adds CodeQL, a tool for finding vulnerabilities and mistakes in the code. It was recommended by GitHub, let's see if this can be useful for us.

CI / Build

Fixes #246 Also refactored some tests for the `SuperVectorizer` to make the code cleaner.

enhancement

When passing an already "clean" dataset to the `SuperVectorizer` (which is common in generic pipelines), it raises a `RuntimeError` as `No transformers could be generated!`. The doc mentions ``` Raises...

bug

This PR aims at improving the overall quality of the code and doc. It has several purposes: - Correct typos - Reword unclear sentences - Minor updates to the doc...

I noticed that a test of the `GapEncoder` does not use the `init1` parameter ```py @pytest.mark.parametrize("init1, analyzer1, analyzer2",[ ('k-means++', 'char', 'word'), ('random', 'char', 'word'), ('k-means', 'char', 'word') ]) def test_analyzer(init1,...

CI / Build

This PR adds the new `FuzzyJoin` class that allows joining tables with dirty columns. Co-authored-by: @LeoGrin

In the light of what we talked on improving user experience: We can add example 3 as being the last part of example 2, rather than a separate example. Looking...

Documentation