normandy
normandy copied to clipboard
When validating recipe arguments, "name" may not be included when expected
https://sentry.prod.mozaws.net/operations/normandy-stage/issues/5562863/
KeyError: 'name'
(17 additional frame(s) were not displayed)
...
File "rest_framework/views.py", line 492, in dispatch
response = handler(request, *args, **kwargs)
File "rest_framework/mixins.py", line 84, in partial_update
return self.update(request, *args, **kwargs)
File "rest_framework/mixins.py", line 70, in update
self.perform_update(serializer)
File "rest_framework/mixins.py", line 80, in perform_update
serializer.save()
File "rest_framework/serializers.py", line 209, in save
self.instance = self.update(self.instance, validated_data)
KeyError: 'name'
The recipe that triggered this is https://stage-admin.normandy.nonprod.cloudops.mozgcp.net/api/v3/recipe/741/
Notably, the field recipe.arguments['name'] is !@#*(), which is likely what is triggering this.
I found the root cause of this. Recipe 741 was once a preference experiment, and is now an add-on experiment. The latest version of the recipe is an add-on experiment, but the latest approved version is a preference experiment.
The code that does does the argument.name uniqueness check confuses latest and latest approved. It gets all recipes where the latest revision is the right action type, but then access the names from the "current revision" which is the latest approved if the recipe has been approved.
In order to unblock QA, I've approved a newer version of the recipe that is an opt-out study. This fixes the bug for me on stage. We should also change the code to prevent this in the future.
Is this related to changing action types? Should we disallow this entirely as far down as the model?
Yes, was the problem. Specifically, the problem was changing to a different action type that had incompatible argument schemas. This combined with another problem where some code confuses Recipe.latest_revision and Recipe.approved_revision. Because of this it was seeing the recipe as both types of actions depending on what property it accessed. I think that it got the action type from one revision and the arguments from another revision.
This confusion is a problem that can hurt us in other ways. Disallowing action type changes would fix this case of the problem, but that confusion is still a problem.
@mythmon / @jaredkerim Could you elaborate on what you see the fix for this issue to be? Is it enough to prevent recipes from changing action type, i.e. duplicate of #600? Or do you want someone to do an audit of the code path that led to some code consulting both latest_revision and some consulting approved_revision?
There is definitely a problem related to confusion between latest_revision and approved_revision that this code exposed. In general, the properties on the Recipe class that tranparently pass through to some revision of that recipe probably make things hard to get right. This problem is definitely here. We happened to find symptoms of this problem when recipes changed actions.
In my opinion, we should probably remove the pass through properties on Recipe, and make all the code that access the recipes be more explicit about which revision it is using. That's a third option that you didn't mention. Of the two, I'd like to see an audit of the code paths that got confused.
Restricting recipes from changing action types is reasonable, but when it comes to this particular error, it is papering over an underlying problem.