formulaic icon indicating copy to clipboard operation
formulaic copied to clipboard

Add a "safe" mode that avoids using `eval`?

Open matthewwardrop opened this issue 4 years ago • 3 comments

In some cases, particularly in production code, it is nice to have a safe mode that avoids using eval, at the expense of some functionality. This issue is here so this line of thought doesn't get lost, but also to see how important this is to people.

To be clear, removing eval in the current state would basically cripple Formulaic by preventing any Python transformations (including built-in transformations). There may be a middle ground whereby we introspect the generated Python code AST, and verify that the functions to be executed are sanctioned.

matthewwardrop avatar Jul 20 '20 05:07 matthewwardrop

This is related to an issue on patsy https://github.com/pydata/patsy/issues/156, where @FerusAndBeyond points out that running:

from formulaic import model_matrix
model_matrix('sys.exit()', ...)

will kill the running python service. Is there a way to protect formulaic from code injection?

matthewwardrop avatar Sep 25 '21 21:09 matthewwardrop

At the moment, formulae does not use eval() (see here), but it is still vulnerable to sys.exit() calls if you have sys loaded in the environment where design_matrices() is called.

I do think we could port something similar to Formulaic and attempt to ban people calling functions from certain modules.

EDIT

You could have the following scenarios

  1. sys module is not loaded
model_matrix("sys.exit()")

No problem here (I think?). It is going to raise an error because sys is not going to be found.

  1. sys module is loaded
model_matrix("sys.exit()")

Python service is killed

  1. sys module is loaded but developer prevented it from being used
formulaic.disable_modules(["sys"])
model_matrix("sys.exit()")

Python service is not killed. An error is raised because you can't call anything from the sys module.

Does it make sense?

tomicapretto avatar Sep 27 '21 20:09 tomicapretto

Yeah, that is correct!

It should also be noted model_matrix is a user-convenience function. Software libraries should use Formula(...).get_model_matrix which requires you to explicitly nominate the evaluation context, and it does not add the current evaluation context by default. You can simulate that in model_matrix by passing None or the the evaluation context dictionary via model_matrix(context=...), which then replaces the automatically imputed context.

I don't see avoiding eval here to be that useful, since we're not actually adding any security if we will call the functions represented in the string anyway. I think the bigger thing is getting users familiar with the security risks, and using Formula().get_model_matrix() in library code.

matthewwardrop avatar Sep 28 '21 00:09 matthewwardrop

I'm going to close this for now. With proper context isolation (which libraries can already control), the use of eval seems pretty sensible. Avoiding it as suggested by @tomicapretto basically means manually parsing the syntax tree and calling the function anyway. In both cases, since Formulaic parses the AST anyway, we could deny use of particular functions... but I don't see this being especially valuable. If this ever comes up again we can refresh this issue.

matthewwardrop avatar Dec 20 '23 22:12 matthewwardrop