skops
skops copied to clipboard
Secure persistence: exclude (certain) built in functions
For secure persistence, we need to ensure that certain builtin functions cannot be loaded, such as eval, exec and functions from urllib, os, sys...
I think more that "cannot be loaded", we should only allow everything if the user says they're trusted, except a few categories of methods which we trust.
Yes, "cannot be loaded" unless the user says so. How granular do you want that to be, on a module level or should the user indicate individual functions?
That's something we'd need to figure out while working on the audit part. Since it doesn't make sense to load some functions and not load others, it's probably sufficient to fail, show risky sections, and ask the user to load the model with trusted=True or something.
We can also think of a way for users to allow certain modules/libraries/functions, but I'd say that has a lower priority.
Since it doesn't make sense to load some functions and not load others
Could you please elaborate on that? Do you mean that if a user would allow one builtin_function_or_method, they might as well allow all?
I mean, we're not going to do partial load of a model. So we either fail or pass. So from the user's perspective, I was thinking:
- load(file): passes or fails, if it fails, we tell them why
- load(file, trusted=True): passes and loads
We can however think of doing something like:
- load(file, trusted_libs=...): if it fails, there are other reasons for it
But I'm a bit wary of that to start with, cause one can still do exploits using existing libraries. So at least as a start, I was thinking of brutally failing, and telling users why, and once we know more about potential exploits, we can allow more granular acceptance of modules/functions/etc?
We can however think of doing something like:
* load(file, trusted_libs=...): if it fails, there are other reasons for it
To me, that sounds very useful, I'd be much more inclined to trust a known, widely used library than something I've never heard of.
cause one can still do exploits using existing libraries
Then we should address those exploits. Maybe we should have an issue listing all known exploits?
I agree we should address the exploits, and that's why I started by fixing the one in joblib (merged and released). And I'm not saying we shouldn't support the API to trust particular libraries, I'm must saying we should probably wait to learn more before we do it.
And I'm not saying we shouldn't support the API to trust particular libraries
Sorry, I didn't mean to imply that, I just wanted to give my opinion from a user's perspective.
I agree we should address the exploits
Do we want to document them somewhere? Thinking about it, maybe a non-public place would be better.
Do we want to document them somewhere? Thinking about it, maybe a non-public place would be better.
Some of them don't need to be private, like removing __reduce__ from sklearn estimators. But for others, we can use "Security Advisories" on github: https://docs.github.com/en/code-security/repository-security-advisories/about-github-security-advisories-for-repositories. I haven't used this feature, but it seems it's exactly what it's for.
But for others, we can use "Security Advisories" on github
That seems to be almost overkill for now :D I just want a reminder, so that we don't forget about security issues we found.
We now raise on everything which is not trusted by default, and that includes insecure builtin functions. I think we can close this now.