skops icon indicating copy to clipboard operation
skops copied to clipboard

RFC Include generating file(s) in the repo

Open adrinjalali opened this issue 3 years ago • 8 comments

While thinking about https://github.com/skops-dev/skops/issues/110 I realized it'd be much easier to fix existing issues in repos if the file generating the model and the modelcard was included in the repo.

The question is, do we want to include the scripts generating them in the repo, and if yes, how.

cc @skops-dev/maintainers , @julien-c, @osanseviero , @lysandrejik

adrinjalali avatar Aug 23 '22 09:08 adrinjalali

In general, I'm in favor of this, because it makes it easy to replicate everything. I would not go as far as to force this but it should be encouraged.

Maybe we can think of a standard. E.g. for models, there could be a train.py script that generates the model artifact, config, and model card when called (without arguments). The requirements file should be sufficient to run that script.

BenjaminBossan avatar Aug 23 '22 11:08 BenjaminBossan

I agree that it shouldn't be a requirement, but I'm not sure of the API to let this happen.

adrinjalali avatar Aug 23 '22 12:08 adrinjalali

but I'm not sure of the API to let this happen.

You mean an API from the skops side to provide this functionality? I don't think it's really possible. My thought was rather to lead by example.

BenjaminBossan avatar Aug 23 '22 12:08 BenjaminBossan

Yeah, I was thinking something like having an API which would take __file__ and save it to the model folder, but I'm not sure how clean it is.

adrinjalali avatar Aug 23 '22 12:08 adrinjalali

Hmm, wouldn't that result in users potentially uploading files that they may not have intended to be uploaded?

BenjaminBossan avatar Aug 23 '22 13:08 BenjaminBossan

I think it makes sense and we should have it, just like they do with Space repositories (app.py file). I also agree with @BenjaminBossan that it might cause users to potentially upload files that they don't want in the repository. Also they might be putting a couple of API keys in their scripts or notebooks if it's generated by a notebook. We could take the file during init and by default not have it inside. Maybe also add a warning for users to check if there's any variable that shouldn't be public.

merveenoyan avatar Aug 25 '22 17:08 merveenoyan

So instead of adding it automatically, we could have a method like hub_utils.add_file which can also accept __file__ as input value.

adrinjalali avatar Aug 30 '22 11:08 adrinjalali

So instead of adding it automatically, we could have a method like hub_utils.add_file which can also accept __file__ as input value.

I'll take a look at that

BenjaminBossan avatar Aug 31 '22 10:08 BenjaminBossan

I think we can close this one, and try to add generating files to existing repos, we should clean them up at some point anyway.

adrinjalali avatar Sep 06 '22 15:09 adrinjalali