skops
skops copied to clipboard
RFC Include generating file(s) in the repo
While thinking about https://github.com/skops-dev/skops/issues/110 I realized it'd be much easier to fix existing issues in repos if the file generating the model and the modelcard was included in the repo.
The question is, do we want to include the scripts generating them in the repo, and if yes, how.
cc @skops-dev/maintainers , @julien-c, @osanseviero , @lysandrejik
In general, I'm in favor of this, because it makes it easy to replicate everything. I would not go as far as to force this but it should be encouraged.
Maybe we can think of a standard. E.g. for models, there could be a train.py script that generates the model artifact, config, and model card when called (without arguments). The requirements file should be sufficient to run that script.
I agree that it shouldn't be a requirement, but I'm not sure of the API to let this happen.
but I'm not sure of the API to let this happen.
You mean an API from the skops side to provide this functionality? I don't think it's really possible. My thought was rather to lead by example.
Yeah, I was thinking something like having an API which would take __file__ and save it to the model folder, but I'm not sure how clean it is.
Hmm, wouldn't that result in users potentially uploading files that they may not have intended to be uploaded?
I think it makes sense and we should have it, just like they do with Space repositories (app.py file). I also agree with @BenjaminBossan that it might cause users to potentially upload files that they don't want in the repository. Also they might be putting a couple of API keys in their scripts or notebooks if it's generated by a notebook.
We could take the file during init and by default not have it inside. Maybe also add a warning for users to check if there's any variable that shouldn't be public.
So instead of adding it automatically, we could have a method like hub_utils.add_file which can also accept __file__ as input value.
So instead of adding it automatically, we could have a method like
hub_utils.add_filewhich can also accept__file__as input value.
I'll take a look at that
I think we can close this one, and try to add generating files to existing repos, we should clean them up at some point anyway.