skops load model in Card class

#96 added a function in the Card class to load the model if provided a file path. sorry if the implementation looks naive

Oct 05 '22 14:10 p-mishra1

Right now, the model is eagerly loaded during init. Recently, we made a move to prevent this (see https://github.com/skops-dev/skops/pull/163). However, lazy loading could be costly because it would be done for each call to repr. Maybe we can store the actual model in a different attribute (say, _model) when it's called for the first time?

But we need to make sure we invalidate that whenever the _model changes.

Oct 06 '22 09:10 adrinjalali

But we need to make sure we invalidate that whenever the _model changes.

This could be done with a setter but maybe we can come up with an easier solution?

Oct 06 '22 11:10 BenjaminBossan

We need to have a test for this. If you need help implementing the test, let us know.

Yes sure, there will be same test function but instead of model instance we have to pass a tmp model file (one for skops and one for pkl).

Right now, the model is eagerly loaded during __init__. Recently, we made a move to prevent this (see #163). However, lazy loading could be costly because it would be done for each call to __repr__. Maybe we can store the actual model in a different attribute (say, _model) when it's called for the first time?

On this as the same argument is supporting both model and model_path so in representation and in _generate_card we will be doing model loading. else we can do it in representation on 1st call and set _model to model instance but in that case both model and '_model' will have the instance when user will provide model instance to Card class.

Oct 06 '22 12:10 p-mishra1

@p-mishra1 Just to make sure, what is the state of this PR? Are you waiting for input from our side?

Oct 11 '22 08:10 BenjaminBossan

Yes @BenjaminBossan i changed the functionality to load skops file as well as pkl file and also mentioned some of the things about thr design choice. Waiting for input from you guys about that.

Oct 11 '22 15:10 p-mishra1

We need to have a test for this. If you need help implementing the test, let us know.

Yes sure, there will be same test function but instead of model instance we have to pass a tmp model file (one for skops and one for pkl).

So here we would like to see unit tests for the new feature being added, similar to what we have here. Would you please work on that? We can help if needed.

On this as the same argument is supporting both model and model_path so in representation and in _generate_card we will be doing model loading. else we can do it in representation on 1st call and set _model to model instance but in that case both model and '_model' will have the instance when user will provide model instance to Card class.

So I prefer the second solution. When the model is first accessed, if it's a path, it is loaded, if it's already a model, then we're good. It should be stored on a private attribute like _model. Then the next time, the model is already loaded and we don't need to load it again. When the user changes the model attribute, we need to invalidate _model, since it could now be a different model.

Oct 12 '22 08:10 BenjaminBossan

thanks @BenjaminBossan will update the changes

Oct 25 '22 06:10 p-mishra1

thanks @BenjaminBossan for the code snippet and detailed explanation i have added required changes, i also added a deleter for model. have a look.

Oct 25 '22 17:10 p-mishra1

Formatting

Could you revert the formatting changes you made? It looks like some tool you use (your IDE maybe?) reformatted the files you worked on to use 79 or 80 characters per line. However, we use black with the default settings, which comes down to ~88 characters per line. If you undo these changes, the code will be much easier to review because the formatting changes distract from the real changes.

The best way to achieve this would be if you could first undo your formatting changes, then add black via a pre-commit hook. This step is explained in our contribution pipeline but the gist is that you should do something like:
pip install pre-commit
pre-commit install
from within your Python environment. After doing this, each time when you commit, the code will be checked for black compliance.

I used the contribution setup before raising the PR but i think my local black setup formatted the code before the being formatted by pre-commit.

Testing

If I understand correctly, you have duplicated a number of tests. Examples are test_plot_model_false_from_path and test_add_from_path. The change you introduced was to have the card initialized from a file instead of passing the model directly, the rest is identical.

IMO, it is not necessary to duplicate all those tests, since the actual thing they're testing is not related to how the model was passed. Instead, it is sufficient to have tests that show that passing a model or a str or a Path do the same thing, and that modifying the .model attribute works (as in the last test class (which btw. should also test deleting, which you correctly added)). The duplicated tests can then be removed.

Another suggestion I have, which is not as clear cut, is to move _load_model out to be a standalone function and not a method on Card. Then add a few tests just for this function. Once we know that this function works well, testing the Card class will be much easier because we know that the model loading part is covered.

okay, I will remove the duplicate tests and make standalone _load_model function and write tests for funtion as well as passing path.

Test fixtures

I think there is a misunderstanding regarding the use of pytest fixtures. Let's take this example:
@pytest.mark.parametrize("suffix", [".pkl", ".pickle", ".skops"])
def test_save_model_card_from_path(destination_path, model_card_from_path):
    model_card_from_path.save(Path(destination_path) / "README.md")
    assert (Path(destination_path) / "README.md").exists()
Here you use the fixture suffix but your test doesn't make use of it. This means that the exact same test will be run 3 times, with no additional value. The correct way to use fixtures is to create the decorator (as you did), then add the fixture name as an argument to the function, and then to use that argument in the function (e.g. in test_plot_model_false_from_path you use the fixture correctly).

Fixture parameterisation here the pytest fixture model_card_from_path has a param named suffix and pytest is implicitly passing the values. have a look at the above stackoverflow link, so the pytest behaviour is correct(i checked the test locally and verfied the output and behaviour before the PR ).

Oct 26 '22 14:10 p-mishra1

here the pytest fixture model_card_from_path has a param named suffix and pytest is implicitly passing the values. have a look at the above stackoverflow link, so the pytest behaviour is correct(i checked the test locally and verfied the output and behaviour before the PR ).

Ah I see, I didn't know about this new feature. It's kinda cool but also a bit magic. I wonder if it's really preferable over using request.param to parametrize a fixture.

Oct 26 '22 14:10 BenjaminBossan

Oh nice, finally pytest supporting this. Been waiting for it for ages. I personally find it more intuitive than using request.param. But everywhere this pattern is used, I'd leave a comment saying it's a parameter passed to the fixture for future developers to figure it out easily.

Oct 28 '22 10:10 adrinjalali

Created a fresh PR in #205 for this.

Oct 28 '22 15:10 p-mishra1