skops
skops copied to clipboard
load model in Card class
#96 added a function in the Card class to load the model if provided a file path. sorry if the implementation looks naive
Right now, the model is eagerly loaded during init. Recently, we made a move to prevent this (see https://github.com/skops-dev/skops/pull/163). However, lazy loading could be costly because it would be done for each call to repr. Maybe we can store the actual model in a different attribute (say, _model) when it's called for the first time?
But we need to make sure we invalidate that whenever the _model changes.
But we need to make sure we invalidate that whenever the
_modelchanges.
This could be done with a setter but maybe we can come up with an easier solution?
- We need to have a test for this. If you need help implementing the test, let us know.
Yes sure, there will be same test function but instead of model instance we have to pass a tmp model file (one for skops and one for pkl).
- Right now, the model is eagerly loaded during
__init__. Recently, we made a move to prevent this (see #163). However, lazy loading could be costly because it would be done for each call to__repr__. Maybe we can store the actual model in a different attribute (say,_model) when it's called for the first time?
On this as the same argument is supporting both model and model_path so in representation and in _generate_card we will be doing model loading.
else we can do it in representation on 1st call and set _model to model instance but in that case both model and '_model' will have the instance when user will provide model instance to Card class.
@p-mishra1 Just to make sure, what is the state of this PR? Are you waiting for input from our side?
Yes @BenjaminBossan i changed the functionality to load skops file as well as pkl file and also mentioned some of the things about thr design choice. Waiting for input from you guys about that.
- We need to have a test for this. If you need help implementing the test, let us know.
Yes sure, there will be same test function but instead of model instance we have to pass a tmp model file (one for skops and one for pkl).
So here we would like to see unit tests for the new feature being added, similar to what we have here. Would you please work on that? We can help if needed.
On this as the same argument is supporting both model and model_path so in representation and in _generate_card we will be doing model loading. else we can do it in representation on 1st call and set
_modelto model instance but in that case bothmodeland '_model' will have the instance when user will provide model instance toCardclass.
So I prefer the second solution. When the model is first accessed, if it's a path, it is loaded, if it's already a model, then we're good. It should be stored on a private attribute like _model. Then the next time, the model is already loaded and we don't need to load it again. When the user changes the model attribute, we need to invalidate _model, since it could now be a different model.
thanks @BenjaminBossan will update the changes
thanks @BenjaminBossan for the code snippet and detailed explanation i have added required changes, i also added a deleter for model. have a look.
Formatting
Could you revert the formatting changes you made? It looks like some tool you use (your IDE maybe?) reformatted the files you worked on to use 79 or 80 characters per line. However, we use
blackwith the default settings, which comes down to ~88 characters per line. If you undo these changes, the code will be much easier to review because the formatting changes distract from the real changes.The best way to achieve this would be if you could first undo your formatting changes, then add
blackvia a pre-commit hook. This step is explained in our contribution pipeline but the gist is that you should do something like:pip install pre-commit pre-commit installfrom within your Python environment. After doing this, each time when you commit, the code will be checked for
blackcompliance.
I used the contribution setup before raising the PR but i think my local black setup formatted the code before the being formatted by pre-commit.
Testing
If I understand correctly, you have duplicated a number of tests. Examples are
test_plot_model_false_from_pathandtest_add_from_path. The change you introduced was to have the card initialized from a file instead of passing the model directly, the rest is identical.IMO, it is not necessary to duplicate all those tests, since the actual thing they're testing is not related to how the model was passed. Instead, it is sufficient to have tests that show that passing a model or a
stror aPathdo the same thing, and that modifying the.modelattribute works (as in the last test class (which btw. should also test deleting, which you correctly added)). The duplicated tests can then be removed.Another suggestion I have, which is not as clear cut, is to move
_load_modelout to be a standalone function and not a method onCard. Then add a few tests just for this function. Once we know that this function works well, testing theCardclass will be much easier because we know that the model loading part is covered.
okay, I will remove the duplicate tests and make standalone _load_model function and write tests for funtion as well as passing path.
Test fixtures
I think there is a misunderstanding regarding the use of pytest fixtures. Let's take this example:
@pytest.mark.parametrize("suffix", [".pkl", ".pickle", ".skops"]) def test_save_model_card_from_path(destination_path, model_card_from_path): model_card_from_path.save(Path(destination_path) / "README.md") assert (Path(destination_path) / "README.md").exists()Here you use the fixture
suffixbut your test doesn't make use of it. This means that the exact same test will be run 3 times, with no additional value. The correct way to use fixtures is to create the decorator (as you did), then add the fixture name as an argument to the function, and then to use that argument in the function (e.g. intest_plot_model_false_from_pathyou use the fixture correctly).
Fixture parameterisation
here the pytest fixture model_card_from_path has a param named suffix and pytest is implicitly passing the values. have a look at the above stackoverflow link, so the pytest behaviour is correct(i checked the test locally and verfied the output and behaviour before the PR ).
here the pytest fixture
model_card_from_pathhas a param namedsuffixand pytest is implicitly passing the values. have a look at the above stackoverflow link, so the pytest behaviour is correct(i checked the test locally and verfied the output and behaviour before the PR ).
Ah I see, I didn't know about this new feature. It's kinda cool but also a bit magic. I wonder if it's really preferable over using request.param to parametrize a fixture.
Oh nice, finally pytest supporting this. Been waiting for it for ages. I personally find it more intuitive than using request.param. But everywhere this pattern is used, I'd leave a comment saying it's a parameter passed to the fixture for future developers to figure it out easily.
Created a fresh PR in #205 for this.