DEP 0017 Content Type Parsing
Hi @carltongibson 👋
I've made a start on writing up a DEP following the discussion on the forum. This is based upon previous discussions both on the mailing list, the forum and the linked PRs.
There's a couple of "TODO"s which need a bit more thought. But thought worth sharing where I have got to.
I hope that I have presented a fair and unbiased proposal of where we are and the concerns raised.
Thanks for picking this up @smithdc1! 🎁 Good timing. Let's chat in Vigo.
Hi All,
Not to sure where exactly to leave/put this feedback but wanted to at least mention some concerns:
First of all I think keeping form_data as the exact same behavior as POST is a really good idea, along with not touching the almost 20-year old original names. I do think having two names for everything could be confusing, but the new names are probably more clear for beginners (who aren't PHP programmers like 20 years ago). However meta could probably use some improvement to be more clear: cgi_variables? cgi_environ? cgi_environment_meta_variables? (Edit: form_data also has the potential to be confusing because of <form method="GET">, but I don't have a good solution to that. Edit 2: <form method="PUT"> maybe isn't allowed? so maybe post_form_data would be more clear if only POST method is allowed?)
Regarding content-negotiation, I have some of concerns:
-
If Django does any json parsing (including a
request.get_json()attr), are there going be denial-of-service security issues? Like does Django need to respect DATA_UPLOAD_MAX_NUMBER_FIELDS? Does Django need to worry about how deeply nested the json can possibly go? Does Django need to worry about cases where field/key names are repeated? (I guess that one was already mentioned on Jun 7th.) -
json and form_data aren't really as interchangeable as much as urlencoded/multipart are. urlencoded and multipart both allow duplicate keys as a way of specifying a list (like multiple checkboxes or select multiple). It's all standardized as part of html, but there's no such thing as
<form enctype="application/json">, so there's no way to auto-translate between the two. (Same with json and xml: they are not quite interchangeable.) -
I'm not sure it's generally a good idea to let the client decide what content-type to use when sending data. I could see some occasional cases to allow content negotiation, but I think in general it might be more secure-by-default to lock a view down to only accepting one content type? Like if my code processing form data from an html form is expecting only string-to-string key value pairs, it should probably stick to the .form_data attribute. Using a .data attribute and opening up the possibility of the client sending in json to that attribute might become a surprise issue. It might not even be dictionary.
-
Similarly, people might set up Web-Application-Firewall rules to do some filtering on json that's getting passed to a view, and a client sending urlencoded or multipart might be able to bypass that filtering. (and vice-versa).
-
Also re denial-of-service, I also think Django shouldn't parse json just because it was sent, unless the view is actually asking for the json. Like, any current view that's currently accepting form data shouldn't start auto-parsing json like it does with GET/POST/FILES/COOKIES/headers, just because Django starts to allow json. Wait until the view actually calls request.json() before parsing it. The view might not actually want it and parsing json is more complicated and slower than any of the other parsing, except maybe multipart, but there's already a bunch of denial-of-service checks for that.
-
I kinda hinted at this before https://code.djangoproject.com/ticket/32646#comment:5, but it seems to me what people actually want is to have an easy way to parse json, but Django is trying to solve that by making a generic framework to handle "any" content type, which I personally think is actually going to lead to a lot of issues and confusion and wishy-washy views. If your view is expecting json then get it from
request.json()(which should check content-type before parsing, like .form_data does). If you're expecting urlencoded/multipart, userequest.form_data. If you're expecting something else, then parse it yourself. It makes the code more clear what's going on and therefore easier to read, and probably more secure. -
In general, if Django does allow this sort of auto-content negotiation, I'd personally recommend not using it by default, unless you know you actually need that feature and know what you're doing. This seems kinda similar to the issue with
request.REQUEST(union of GET and POST) that was removed because of security issues. -
Explicit is better than implicit, Simple is better than complex, Readability counts, etc.
-
I don't think you need to add content-negotiation in order to justify renaming request vars, because I think the temptation will be for people to switch to using content-negotation as part of the rename because it's the shiny new exciting feature, and I don't think that's something that should be encouraged by default for the reasons above.
Anyway that's just a quick rant / two cents about content-negotiation, doesn't need to hold things up, but I thought I'd at least mention these issues for my own sake so they're not swirling around in my head forever.
Thanks, Collin
Thank you for your thoughts -- A couple of responses from me. I'm happy to add these to the DEP if that would be helpful.
If Django does any json parsing (including a request.get_json() attr), are there going be denial-of-service security issues?
I think ensuring compliance with the DATA_UPLOAD_MAX_MEMORY_SIZE setting would be the defense here? This would be consistent with how other requests are currently handled.
Also re denial-of-service, .... Wait until the view actually calls request.json() before parsing it.
I agree that the behavior for .data should be the same as .POST and so on -- i.e. only parse when required.
Something I think would be valuable here is discussion of HTTP verbs beyond just GET and POST, most notably PUT and PATCH. If I understand correctly, content negotiation is likely to most relevant for an API rather than a traditional web page. Along these lines, I think referencing prior art in Django Rest Framework and other third party API libraries would also improve the DEP.
I would just like to add a user perspective that I haven't seen highlighted yet. In discussions around content type parsing, it is often that forms always use POST since you cannot use PUT or PATCH in an HTML form. This was mentioned as the reason why PUT requests do not populate the POST dictionary. While this is the case with standard HTML5 forms, hypermedia libraries like HTMX encourage users to use the most idiomatically appropriate HTTP verb for a request, which sometimes means using PUT or PATCH. Of course, you can still use a POST request, but I would like to be able to use a PUT when I am updating an existing resource, for example.
This DEP should be re-numbered (I suggest 0017) because we've accepted #98 at DEP 0015.
Hey all, this DEP has been presented during forum discussions about better supporting API development with Django as a good first step.
If there is a good way for me to help, I would like to.
Does this DEP have a shepherd yet? @smithdc1 would like like any help authoring the DEP or moving anything along?
Hi @camuthig
Help with pushing this forward would be very much appreciated.
Do you have thoughts on what steps should be taken to progress this?
@smithdc1 if there isn't a shepherd, I think it would be best to find one. I am not capable of filling that role, though.
@carltongibson sent me to this PR, and I know he has been a part of this discussion for a very long time. I'm not sure if he is available to be a shepherd or if he might be able to recommend someone else in the community for the task.
I think first steps are to review the suggestions made already in the PR and either accept them or work through them. After that, I think we would need to see where things stand, asking the reviewers to brush up on the conversation and see if conceptually the DEP can move forward or if there are deeper discussions that need to be had on top of the recommendations made here.
I don't have the ability to accept suggestions. I can review suggestions and provide a 👍 or ask questions as a marker for you? From my review so far most seem like good recommendations supported by multiple reviewers, though, so it is likely they are ready for acceptance already?
I don't have capacity to lead this now.
I have a plan to implement what's needed as a third-party handler subclass and request class, to act as a proof of concept and test bed. I'd then swing back to a DEP once it was demonstrated in practice.
I would have got to that already, except it hasn't managed to become a priority for me. It's not something I've actually needed on the work project (and we're nearly three years in at this point). We will eventually, but I can say when that will be.
I believe @nanuxbe is interested in seeing this progress, but I also know she has several projects in hand. 🤹
... implement what's needed as a third-party handler subclass and request class, to act as a proof of concept and test bed.
If you want to push it forwards, I'd advise doing that. The existing PRs already show the way, there's everything you need there. (It'd be a good learning experience too.)
That's a great recommendation @carltongibson . Thank you for providing the feedback.
@smithdc1 are you open to working on a third-party implementation? I am traveling for a few days but I can start working on something later this week as an experiment with the ideas you have described here.
This DEP should be re-numbered (I suggest 0017) because we've accepted https://github.com/django/deps/pull/98 at DEP 0015.
Yes, I had a hard time figuring out where to make comments about the new project template because I kept ending up here.
I changed the PR title at least.