PyDev.Debugger icon indicating copy to clipboard operation
PyDev.Debugger copied to clipboard

Optimize var_to_xml

Open fabioz opened this issue 7 years ago • 8 comments

It seems most time is used at saxutils.escape and urllib.quote, so, ideally we could have a cythonized version which does both at a single pass.

On the matter: Falcon has a utility which should be faster too:

https://github.com/falconry/falcon/blob/master/falcon/util/uri.py

fabioz avatar Apr 29 '17 13:04 fabioz

As a note for potential implementors, Ideally we'd be able to create a pure python version which does the xml escaping and urllib quote/quote_plus in a single pass, so that we don't need to do all those replaces (thus always creating a new string), given that the quote/quote_plus already iterates char-by-char (and also cythonize it when cython is available).

fabioz avatar Apr 30 '17 10:04 fabioz

I actually think the string concats may be taking long. BTW why are we using quote from urllib. This makes no sense to me.

fitermay avatar May 21 '17 17:05 fitermay

What I'm trying to say is: why are we URL escaping the value, shouldn't we XML escape it instead?

fitermay avatar May 22 '17 18:05 fitermay

The client/server protocol currently requires the urllib escaping (each command is a single line which is url-escaped -- i.e.: can't have new lines).

Now, the contents of each command is composed of a xml, so, the contents of the xml must also be xml-escaped.

This could probably be changed to use a newer protocol (such as messagepack), but this would break backwards compatibility with clients (so, would be considerably more work and the gains may still be marginal to just improving the current implementation).

fabioz avatar May 26 '17 14:05 fabioz

Wouldn't XML escaping disallowed symbols be backward compatible?

fitermay avatar May 26 '17 14:05 fitermay

No (clients are currently required to undo the url escape).

fabioz avatar May 26 '17 14:05 fabioz

I think it would only not be backward compatible if the original string contains something that looks like a URL escaped pattern. Otherwise unescaping would just be a no op,right?

fitermay avatar May 26 '17 14:05 fitermay

Agreed, if new lines are escaped and the user doesn't have any variables which would break the url unescaping it should also work.

fabioz avatar May 26 '17 14:05 fabioz