furl icon indicating copy to clipboard operation
furl copied to clipboard

Feature request: Convert to unescaped string

Open Aran-Fey opened this issue 2 years ago • 2 comments

When urls (or parts thereof) are converted to a string, they're always escaped:

>>> url = furl('foo.bar/fire truck?hello world=#hi there')
>>> str(url)
'foo.bar/fire%20truck?hello+world=#hi%20there'
>>> str(url.path)
'foo.bar/fire%20truck'
>>> str(url.query)
'hello+world='
>>> str(url.fragment)
'hi%20there'

It would be useful if there was a way to obtain unescaped strings:

>>> url.unescaped_str()
'foo.bar/fire truck?hello world=#hi there'
>>> url.path.unescaped_str()
'foo.bar/fire truck'
>>> url.query.unescaped_str()
'hello world='
>>> url.fragment.unescaped_str()
'hi there'

Aran-Fey avatar Jul 30 '22 07:07 Aran-Fey

lets zoom out a bit so i understand the exact problem youre trying to solve! that way we can best solve it with furl :)

to start, what are you using these unescaped strings for?

gruns avatar Aug 04 '22 22:08 gruns

Hmm, that's a bit tough to explain. Essentially, my program is a web scraper. You give it an URL as input, and it scrapes that website. You can use the #fragment to narrow down what you want it to scrape. For example, if the URL is example.com#Hello World it looks for a <h1>Hello World</h1> and only scrapes that section. So I need the text "Hello World", and not "Hello%20World".

To put it more generally: furl is designed to output URLs. You put (unescaped) text in, and you get a valid (escaped) URL as output. But you can't do the opposite, i.e. take an URL as input and parse/destructure it into (unescaped) information.

Aran-Fey avatar Aug 05 '22 19:08 Aran-Fey