skops icon indicating copy to clipboard operation
skops copied to clipboard

Secure persistence: add extensibility

Open BenjaminBossan opened this issue 2 years ago • 6 comments

A mechanism should be implemented that allows library authors to add secure persistence to their library, and for users to opt those libraries in.

BenjaminBossan avatar Sep 16 '22 16:09 BenjaminBossan

Thinking more about this, what do you think of supporting only __getstate__/__setstate__, and allowing certain types as we do now, like lists, dicts, function, etc.

This would mean we don't need to do much on our side, and they only need to implement a sensible __getstate__/__setstate__ pair.

If we manage to move all sklearn models to the same path, then we can completely remove __reduce__.

adrinjalali avatar Sep 27 '22 06:09 adrinjalali

That sounds good, we can start with __getstate__ & __setstate__ and hopefully that's good enough. If it isn't, we can later think about providing more. If we can eventually stop supporting __reduce__, it should really lower our burden as well.

Still, the user should have to opt in explicitly. Do you already have an API in mind?

BenjaminBossan avatar Sep 27 '22 07:09 BenjaminBossan

I'm not sure what you mean by user in this context. I'm thinking the save would automatically work if the object of third party libraries provide these methods, and during load, we'd fail if we don't know the library, and user can say they trust the source kinda thing.

I'm still not sure if we should load simple objects if the user has the corresponding library installed or not, if they have it installed, they trust it?

adrinjalali avatar Sep 27 '22 09:09 adrinjalali

and during load, we'd fail if we don't know the library, and user can say they trust the source kinda thing.

That's what I meant. How would the API for this look like?

I'm still not sure if we should load simple objects if the user has the corresponding library installed or not, if they have it installed, they trust it?

Previously, I would have said that if you installed a malicious library, you're probably already compromised, so there is nothing to be gained from not trusting it at this point. But IIUC, Python is moving in a direction of not allowing arbitrary code to be run during installation (avoiding setup.py), so just installing a malicious package wouldn't mean you're compromised. But for the time being, there is of course still a lot of code that relies on setup.py that needs to be supported and I'm not sure if that's planned to be deprecated. (This is all from memory, maybe I get some details wrong.)

BenjaminBossan avatar Sep 27 '22 11:09 BenjaminBossan

Packages can run arbitrary code through .pth files. We don't need setup.py to run arbitrary code :D

discussions about removing .pth files: https://bugs.python.org/issue33944 rejected PEP about an alternative mechanism: https://peps.python.org/pep-0648/

While saving models, we save everything, it's only while loading that security issues come in play. Therefore I think simply relying on __getstate__/__setstate__ is enough, and there doesn't need to be an extra user facing API for it.

adrinjalali avatar Sep 29 '22 13:09 adrinjalali

Packages can run arbitrary code through .pth files. We don't need setup.py to run arbitrary code :D

Oh man, it never stops, does it? Okay, so we can basically assume that if a malicious package is installed, the user is already compromised.

While saving models, we save everything, it's only while loading that security issues come in play. Therefore I think simply relying on __getstate__/__setstate__ is enough, and there doesn't need to be an extra user facing API for it.

Okay, let's do it this way and hopefully it'll be sufficient.

BenjaminBossan avatar Sep 29 '22 13:09 BenjaminBossan