ideas icon indicating copy to clipboard operation
ideas copied to clipboard

context anti-patterns

Open wardi opened this issue 10 years ago • 4 comments

The context dict that is passed to action functions when called locally or from controllers is often abused. I can see the argument for some kind of object that includes security-sensitive values like the user object or values like model that could let you mock out the real models for tests.

However, we have at least three classes of values that we shouldn't be passing as part of a "context":

  1. using context instead of normal parameters for calls from an action function to other parts of ckan. e.g. the 'prevent_packages_update' passed to group_dict save (which I'd fix this way: https://github.com/wardi/ckan/commit/303ab34354dafd214144ff49117df4403ce9174a )

    There's no reason at all I can see for this sort of use. it's a value used by the immediate function we're calling and only that function. That's what parameters are for. Python even lets you set a default and has conventions for documenting these things.

  2. using context to pass non-sensitive values many levels to where they are actually used. e.g. the revision_id/revision_date values that can be passed to package_show https://github.com/ckan/ckan/blob/master/ckan/controllers/package.py#L356-L372

    This is an actually-useful parameter that should be part of the package_show API! Why is it being hidden in the context object?

    When we use context to pass a value many layers deep it's very hard to reason about the behaviour of code without understanding everything above and below. Let's add revision_date to package_show's data_dict, and use a normal parameter on the way down.

  3. In some places because context is a mutable object it's also used to pass values up to callers. e.g. adding a package object for use by a the REST api controller after package_create: https://github.com/ckan/ckan/blob/master/ckan/logic/action/create.py#L197-201

    This sort of thing has led to a number of bugs myself and others have had to fix The tendency to reuse a context between many call_action calls makes this even worse. I had to help a co-workers debug an issue where some unrelated action calls would work when done in one order but not another that was caused by this.

    This also makes it really hard to reason about what an action call will do. You need to know everything that might have ever been put in a context object by anyone above, below or in code that happened to run before the code you're looking at.

My preference would be to make context an immutable object like a NamedTuple that can only ever have a few expected values: e.g. model, session, user. Because it's immutable you're forced by the language to create a new one any time you need to change something, and it's always safe to re-use from one call to the next.

Such an object could mostly maintain backwards compatibility by defining a __getattr__ method that returns the expected attributes.

wardi avatar May 15 '14 16:05 wardi

I'm +1 and i'm also not really sure why we use it for encapsulating model, session etc (why couldn't those be arguments to logic functions ...)

rufuspollock avatar May 16 '14 16:05 rufuspollock

So I currently think contexts in their current state are pointless.

I'm assuming contexts only contain model, session and user. I get why context exists, it basically comes down to whether

  • you prefer passing parameters for the request/session/user around into functions, so that the parameters define exactly what the function needs to execute. If we do this though, these are 'different' to other parameters which is why they've been seperated into context/data_dict.
  • Or whether you are happy using threadlocals, so that if your logic function, calls another function that calls another, that calls ..., that calls function f, f can easily get the requet/session/user by calling the flask g object/pylons g thing to get the user without having to muddy up all the other previous function callers definitions.

There is no right answer, the guy who wrote flask wrote this a blog post detailing his hate for thread locals before he wrote flask, where the decision was made to use them, I know that David R is not a fan of them

The problem I think we have, is that ckan straddles both sides and we end up with a worse solution. Currently in ckan, we use the scoped_session, which is a threadlocal based setup for sqlalchemy, So unless theres is some additional voodoo(possibly vdm, i haven't looked at that code), then ckan.model.Session and context['session'] are currently always the same. This is incredibly confusing, especially to new ckan developers.

The way we supply user into the context is a pain in the backside, for example when writing tests. When you're writing a test using the factories and you're passing a user in as an argument, but you might have been given a user_dict by another action function in a test, so what do you use? the username or the user_dict? The answer is we have magic which takes a stab at figuring it out so the context['user'] is setup properly.

The problem is even with context, we're still using all the pylons g,c and whatever else everywhere anyway. So we have the problem of threadlocal objects everywhere in the code, but we still have to pass our session/user everywhere, (but not the request, we use pylons threadlocal request for that). Hence why I say contexts in their current state are pointless.

So some of our options could be.

  • make a context namedtuple, but change it's name or properly explain what the hell this thing is in the documentation
  • think about whether we want to include request info into the context, i know some actions have context['for_view'] for some reason or another.
  • get rid of context. This means we'd have to have our own ckan.g threadlocal thing (to move off pylons one and onto flask or whatever) that we can place the username/request in. but eww this is gross
  • get rid of context model, context session. and just let people use ckan.model and model.session, I end up having to import both of these into extensions anyway and people won't have to think about context anymore.
  • get rid of context['user'] by having helper functions get_current_username or something which behind the scenes gets the username from pylons.c.user or whatever(if we ever change to something else)
  • continue to make posts like these but never have the time to change anything anyway.

I'm thinking currently that I like helpers get_current_username as it can be mocked out in tests and users of our code won't need to think of this crap and just call toolkit.get_current_username in their code, and I also like just letting them use model.Session because they won't sit their thinking 'wtf is this context thing' whenever they want to call an action function in their extension, but I haven't really thought any of this through fully.

TL;DR I'm sure I'll change my mind by next month, but generally, we should change context so that it's not baffling to the user. I think by hiding from the user entirely, but we could change it to a namedtuple and documenting what the hell it actually is, or just making them non confusing normal parameters

This post is too long now and i'm sure we won't have figure it out in a year anyway.

joetsoi avatar Feb 16 '15 02:02 joetsoi

Dammit, I was just falling asleep when I thought "model and session should be optional parameters where their default arguments are ckan.model and can.model.session" so the caller doesn't have to specify them each time they use an action function."

So maybe I've changed my mind already

joetsoi avatar Feb 16 '15 03:02 joetsoi

I've just been bitten by this issue writing a paster command, because I forgot I need a new context dict each time I call a function returned from get_action. Specifically I was calling harvest_source_create/harvest_source_update and getting db conflicts because the id in the context was being re-used, rather than the data_dict I'd provided :( (FTR harvest_sources are packages)

Making the context immutable seems like the lowest impact solution (although I suspect it's used a lot in the logic layer when calling other functions). It's also likely to break some extensions that might rely on this (no specific example) - given it might be a breaking change, is it a C3 thing?

rossjones avatar Jun 29 '18 08:06 rossjones