klein icon indicating copy to clipboard operation
klein copied to clipboard

Simpler request helper functions that accept bytes and unicode args

Open notoriousno opened this issue 8 years ago • 17 comments

This is related to a bug I wrote a while back #108 for Klein and 8905 for Twisted. It would make writing code more pleasant if certain functions of the Request object would accept both bytes and unicode. This would also bridge gaps between writing Py2 and Py3 code. This issue was primarily created because there are inconsistencies with:

  • Headers (param can be bytes or unicode)
  • adding cookies (param can be bytes or unicode)
  • getting cookies (param can only be bytes)
  • getting request.args (param can only be bytes)

It is easy to do something like 'unicode or py2 str key'.encode('utf-8') before passing it to the various getter/setter functions, however the more popular Python web frameworks never force users to do this and Klein should follow suite. I'm well aware that these are upstream Twisted issues, but it's extremely noticeable in Klein. My hope is to deploy a solution here, test it out with the people who use Klein, then merge them upstream.

notoriousno avatar Dec 31 '16 16:12 notoriousno

I agree with you on this. Requiring the user to do "something".encode("utf-8") before handing it off to Klein API's, while legal, is weird for new users, especially when compared to other Python frameworks.

I don't know what strategy you want to use, but I recommend changing the Klein API's so that they are something like:

def someKleinAPI(foo):
    doSomethingWith(foo)

to:

def someKleinAPI(foo, encoding="utf-8"):
    if not isinstance(foo, bytes):
        foo = foo.encode(encoding)
   doSomethingWith(foo)

rodrigc avatar Dec 31 '16 16:12 rodrigc

Yes that's pretty much what I was thinking! I recall seeing something like this somewhere deep in Twisted source in the past. Maybe the function was called convertToBytes(someString). So it may already exist somewhere.

notoriousno avatar Dec 31 '16 21:12 notoriousno

Maybe you are thinking of twisted.python.compat.nativeString()

I don't really like that function, so would suggest that you go with the example I put above. You basically have the right idea.

rodrigc avatar Dec 31 '16 22:12 rodrigc

Made a PR, please review if you (or anyone) have the chance. I'm not very familiar w/ the Twisted coding standards but I think I'm close. I've found some of those functions I referred to earlier, however they're scattered in various places and practically do the same thing:

notoriousno avatar Jan 02 '17 18:01 notoriousno

My original PR got deleted by shear stupidity on my part, but I probably should've discussed the issue a bit more in this ticket to begin with...Everything happens for a reason :) right? So getting cookies has a dedicated function aptly called getCookie which can be easily overridden. On the other hand, Request.args is a simple dict. So to deal with that, here are some options:

  1. Use getter/setter functions in a Request subclass.
  2. Replace Request.args with a dict-like object that will convert the key to bytes.

The first option is self explanatory and rather simple to implement. I'm slightly against this, but I'm not completely opposed to the idea. The second option involves some Python magic. I'm leaning more toward option 2 because it will be seamless and "natural" for end users.

My idea is to create a new dict-like object and then access it via a @property named something like arguments in a Request subclass. The arguments "attribute" will behave virtually the same as the args attribute does, with the exception of having the ability to use bytes or unicode params. Here's what the final syntax in Python 3 would look like:

@route('/some/route')
def some_route(request):
    my_cookie = request.getCookie('cookie')
    form_name = request.arguments['name'][0]
    return 'Hello {0}!'.format(form_name)

Option 2 Pros:

  • The key can be bytes or unicode (obviously)
  • Users simply need to change request.args to request.arguments in current code.
  • arguments will operate like any other dict so users should easily understand it
  • If Twisted were to adopt this, very minimal changes would need to be done to that codebase

Option 2 Cons:

  • Yet another class is introduced into the Klein codebase (albeit users will not have to worry too much about it).

What does the Twisted/Klein devs think? Amazingly, I managed to NOT delete all the awesome comments that @wsanchez made on the original PR. So please take a look there for some more details.

notoriousno avatar Jan 06 '17 04:01 notoriousno

I'm surprised we can't re-open a PR but Git confuses me to no end. I know this is Github, but same.

Anyway, I'm doing that thing where I'm thinking about this while doing other things so it doesn't seem like I'm thinking about it but I am. Because I see there there is a need here, for sure, for some simplicity for the benefit of the developers.

Stopping to consider where my pseudo-thinking is right now, the Arguments class is growing on me, but I'm thinking about whether the dict interface is the best way to get there. I think it's part of the answer, but arguments are more interesting than dicts and I find that while dict is super convenient because syntactic sugar and all that, that it sometimes boxes you into fitting square pegs into rounded rect holes, if you know what I mean.

So nothing solid here, just mulling it over…

wsanchez avatar Jan 06 '17 18:01 wsanchez

arguments are more interesting than dicts and I find that...it sometimes boxes you into fitting square pegs into rounded rect holes.

I can agree but I would ask you to ponder about this...Is there a real need for anything more than a dict-like object? Also note that Twisted has been using a plain-old dict for years without a hitch. Certainly I cannot speak for everyone, but as an end user, I will either get or iterate over form data or query strings. This fact holds for other web frameworks as well.

There's actually a 3rd option that can be taken. We can use Werkzueg's MultiDict (or any one of there predefined containers). I drew inspiration from these data structures initially, but ultimately found them to "do too much" so I took certain aspects that I liked and disregarded the rest. As I said earlier, I want to get, set, iterate and everything else are "nice to haves"

notoriousno avatar Jan 07 '17 01:01 notoriousno

Should the value be returned as unicode or bytes (as it currently does)? While it makes sense to return bytes when doing low level transactions over the wire but it doesn't make much sense in terms of HTTP. All other frameworks return unicode should Klein follow suite?

notoriousno avatar Jan 12 '17 03:01 notoriousno

You are moving in the right direction on these patches, IMHO. One comment I would make is try to consistently spell utf-8 when you are using it as an argument to something, for example use utf-8, not utf8, UTF8, nor UTF-8.

rodrigc avatar Jan 12 '17 04:01 rodrigc

@rodrigc done

notoriousno avatar Jan 13 '17 03:01 notoriousno

Ping! I'm sure everyone thinks this is issue is an inconvenience, so let's do something about it. Also, has this issue been raised in Twisted?

notoriousno avatar Feb 12 '17 15:02 notoriousno

@notoriousno - This is a good idea for Treq (which defines its own interface, informally) but not a good idea for Twisted. IAgent is an interoperability interface which Treq uses extensively for layering; it's low-level and it should probably remain bytes-based for the time being. treq.get et. al., however, should expand to take Unicode and perhaps twisted.python.url.URL objects, since they are mostly "user"-facing.

glyph avatar Feb 14 '17 22:02 glyph

@glyph is it a good idea for klein :) ? For low level wire protocols it's totally reasonable for data to be bytes however HTTP is not low level protocol (in my opinion). A thin layer which does bytes <=> unicode conversion for the sake of end user convenience would be a great thing, especially for a project like Klein.

not a good idea for Twisted

This is a stark contradiction to what is already in Twisted. Many functions related to tx.web.Server.Request already handle both types of input.

notoriousno avatar Feb 15 '17 14:02 notoriousno

This is a stark contradiction to what is already in Twisted. Many functions related to tx.web.Server.Request already handle both types of input.

IAgent itself can't change. The way the compatibility policy is worded, interfaces are effectively frozen forever. You could have a new, more lenient subinterface of course, but that wouldn't help Treq or Agent with its internal composition much, since they already have to deal with encoding at the system boundary.

A high-level IAgent wrapper would of course make sense. But… that's what treq is :).

glyph avatar Feb 16 '17 05:02 glyph

Fair enough. In the same vein as treq, I think there's some value in providing high level wrappers in klein as well. All I'm trying to say is, it would be nice if I didn't have to be concerned with what is happening behind the scenes in twisted-land. With the current state of Request in conjunction with Python 3, this is not possible.

notoriousno avatar Feb 17 '17 05:02 notoriousno

Sorry @notoriousno - I think I lost the plot here and thought I was commenting on a Treq issue rather than a Klein issue :).

glyph avatar Feb 17 '17 05:02 glyph

I'm in agreement that having this issue addressed in Twisted may indeed be a good thing. This might come along with some of the work going on to define a more streamlined Request interface with a better top-level API entrypoint (one that's distinct enough from the existing one that we don't need to worry about compatibility with IRequest).

In other words I am referring to the venerable https://tm.tl/288

glyph avatar Feb 17 '17 05:02 glyph