unfurl
unfurl copied to clipboard
parse_url parser does not handle query parameters with no value
Some websites add keys to the URL query string that have no value, but still affect the way the page is displayed. One (trimmed down) example is the following Facebook URL:
https://www.facebook.com/photo.php?type=3&theater
In this case, "theater" sits on its own and indicates that the photo should be opened in a lightbox. Unfortunately, the parameter is missing entirely from the tree after parsing, as you can see below:

This is a pretty easy fix. Modify line 66 of parse_url.py as below:
- parsed_qs = urllib.parse.parse_qs(node.value)
+ parsed_qs = urllib.parse.parse_qs(node.value, keep_blank_values=True)

I could submit a pull request to fix this now, however, on lines 94 and 106 I see regexes for parsing similar forms (a=b|c=d|e=f and a=b&c=d&e=f). I'd like to make sure this issue is fixed in those as well, but I haven't yet figured out how to build a test case/example to cover them. Any ideas?