skrub icon indicating copy to clipboard operation
skrub copied to clipboard

Broken `parametrize`

Open LilianBoulard opened this issue 3 years ago • 0 comments

I noticed that a test of the GapEncoder does not use the init1 parameter

@pytest.mark.parametrize("init1, analyzer1, analyzer2",[
    ('k-means++', 'char', 'word'),
    ('random', 'char', 'word'),
    ('k-means', 'char', 'word')
])
def test_analyzer(init1, analyzer1, analyzer2):
    """" Test if the output is different when the analyzer is 'word' or 'char'.
        If it is, no error ir raised. 
    """
    add_words = False
    n_samples = 70
    X_txt = fetch_20newsgroups(subset='train')['data'][:n_samples]
    X = np.array([X_txt, X_txt]).T
    n_components = 10
    # Test first analyzer output:
    encoder = GapEncoder(
        n_components=n_components, init='k-means++',
        analyzer=analyzer1, add_words=add_words,
        random_state=42, rescale_W=True)
    encoder.fit(X)
    y = encoder.transform(X)
    
    # Test the other analyzer output:
    encoder = GapEncoder(
        n_components=n_components, init='k-means++',
        analyzer=analyzer2, add_words=add_words,
        random_state=42)
    encoder.fit(X)
    y2 = encoder.transform(X)
    
    # Test inequality btw analyzer word and char ouput:
    np.testing.assert_raises(AssertionError, np.testing.assert_array_equal, y, y2)

which makes the parametrize kind of useless as the two other parameters analyzer1 & analyzer2 are the same for all three calls.

IMO, we should hard-code char and word in the test, and parametrize init1 (which we should rename init).

LilianBoulard avatar Aug 01 '22 14:08 LilianBoulard