Passwords containing special characters are not decoded correctly
Hey there guys, our database password contains a # character and databases doesn't seem to like it 😄
In [25]: url = DatabaseURL("postgresql://username:password%23123@localhost/name")
In [26]: url
Out[26]: DatabaseURL('postgresql://username:********@localhost/name')
In [27]: url.password
Out[27]: 'password%23123'
I believe after urlsplit is called, you may also need to unquote each URL component?
If you do not encode the password, urlsplit fails to split things properly:
In [33]: url = DatabaseURL("postgresql://username:password#123@localhost/name")
In [34]: url.username
In [35]: url.password
In [36]: url.hostname
Out[36]: 'username'
In [37]: url.port
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-37-9d4372beb709> in <module>
----> 1 url.port
~/.virtualenv/vsparc-api-fastapi/lib64/python3.6/site-packages/databases/core.py in port(self)
355 @property
356 def port(self) -> typing.Optional[int]:
--> 357 return self.components.port
358
359 @property
/usr/lib64/python3.6/urllib/parse.py in port(self)
167 port = self._hostinfo[1]
168 if port is not None:
--> 169 port = int(port, 10)
170 if not ( 0 <= port <= 65535):
171 raise ValueError("Port out of range 0-65535")
ValueError: invalid literal for int() with base 10: 'password'
This particular Python issue may be of relevance too.
Any help is greatly appreciated Fotis
I've done lots more digging on this. I'm using databases with a PostgreSQL database (i.e. with the asyncpg backend. The reason this issue gets trickier is that databases passes the dsn directly to the asyncpg library which runs urlparse on it (see this for the related code).
Now the issue is, urlparse essentially results in the same output as urlsplit and also does not unquote the netloc property of the result which includes the username and password. As such, it also always fail to connect when a dsn is supplied with urlencoded username and password.
There seems to be an issue relating to it in the asyncpg repository which I hope to read into further now.
When using asyncpg directly, I would typically not use a dsn and pass individual elements which worked perfectly.
Cheers Fotis
I've done lots more digging on this. I'm using databases with a PostgreSQL database (i.e. with the asyncpg backend. The reason this issue gets trickier is that databases passes the dsn directly to the asyncpg library which runs
urlparseon it (see this for the related code).
I found that you can hack that behaviour the following way, it won't solve your pound symbol in password issue as it's a bug upstream, still you may avoid passing a full DSN
import asyncio
from databases import Database, DatabaseURL
async def main():
dburl = DatabaseURL('postgresql://')
database = Database(dburl, host='localhost', port=5438, user="postgres", password="postgres")
await database.connect()
query = """SELECT * FROM pg_catalog.pg_tables"""
r = await database.execute(query=query)
print(r)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
@euri10 Great trick, thanks heaps! Though I do believe that databases itself also has a little problem in that it doens't unquote parts of the URL (particularly the netloc) after using urlsplit in the DatabaseURL class. There's a pull request in the asyncpg repo to repair the problem in the library, and it's essentially using unquote to ensure everything is as it should be.
@euri10 @fgimian you can also try url-encoding as asyncpg will decode it back before actual use.
@gvbgduh Thanks so much, yeah I think that the fix on asyncpg to do this was only just merged into master. So this should be safe for the next release or if you're installing from master. The latest release version (without the fix) doesn't decode URLs and therefore won't work. See https://github.com/MagicStack/asyncpg/issues/436 for more info 😄
The only little thing that's a bit inelegant is when interrogating credentials via the databases library, youl'l still get URL encoded ouptut:
e.g.
In [1]: from databases import Database
In [2]: database = Database("postgresql://username:password%23123@localhost/name")
In [3]: database.url.password
Out[3]: 'password%23123'
I'm not quite sure how much work the databases library should do here. But perhaps it should at least decode such components when they are obtained via a DatabaseURL object.
What do you think?
Really appreciate your help! Fotis
I'm not quite sure how much work the databases library should do here. But perhaps it should at least decode such components when they are obtained via a DatabaseURL object. But perhaps it should at least decode such components when they are obtained via a DatabaseURL object.
That seems correct to me, yeah.
We should apply unquoting to all the string-type accessor methods, including url.password .
I've run into this as well, with other characters such as @ in the password. I thought this was a Starlette issue and opened up this issue: https://github.com/encode/starlette/issues/662, but it looks like it's a little deeper.
@, : and / are a bit tricky as it might break things down the road with the asyncpg parser, though, it's getting fixed (https://github.com/MagicStack/asyncpg/issues/419).
So, unquoting + https://github.com/encode/databases/pull/129 should do the trick.
Looks like asyncpg has been updated: https://github.com/MagicStack/asyncpg/commit/89815ea83620f858719a7df77e9192b5026aa8fe
But even after upgrading asyncpg, this issue still shows up here.
One thing that might be useful is support for urlencoding in Starlette.Configuration, somewhere. No idea what it'd look like yet. I know that doesn't solve the issue for databases, but that seems like the right level for this sort of fix.
I put a proof of concept in encode/starlette#695.
The reason for this issue is that password is not urlencoded string.
I'm fix this issue using this way:
database/core.py:DatabaseURL
from urllib.parse import unquote
...
@property
def username(self) -> typing.Optional[str]:
return unquote(self.components.username)
@property
def password(self) -> typing.Optional[str]:
return unquote(self.components.password)
...
For using:
from urllib.parse import quote
user = quote(user) # e.g. user#001
password = quote(password) # e.g. P@ssw0rd#
url = f"{dialect}://{user}:{password}@{host}:{port}/{db}..."
It's possible to "monkey patch" DatabaseURL to allow percent encoding and decoding of specific properties.
import typing
from urllib.parse import unquote
from databases import Database, DatabaseURL
# monkey patch DatabaseURL to allow percent encoded passwords
# https://stackoverflow.com/a/36158137
def _password(self) -> typing.Optional[str]:
return unquote(self.components.password)
setattr(DatabaseURL, 'password', property(_password))
db = Database(...)
the cloud sql DSNs mentioned in https://github.com/MagicStack/asyncpg/issues/419 really break the DatabaseURL parser -- because Database refuses to pass a DSN down to the backend without parsing it, these hostnames will always end up null.
Example:
# hostname is null:
>>> DatabaseURL('postgresql:///?host=/unix/socket').hostname
# replace() doesn't help:
>>> DatabaseURL('postgresql:///?host=/unix/socket').replace(hostname='/unix/socket').hostname
>>>
This is extra-bad because:
- it triggers a ConnectionError in asyncpg which is hard to debug because there's no host
- the
Databaseconstructor is in module scope (per your docs), so this is stopping servers from coming up at all -- really hard to debug in cloud, blocks most error capture systems from working
Could you pass the DSN down to the backend unmodified? I'm not familiar with this project, so there may be good reasons not to do this, but it would make life easier for parsing edge cases like this one.
Great discussion!
I was curious that I didn't come across this issue before, since my passwords usually do contain special characters. Today, however, is my first brush with Databases. I use SQLAlchemy most of the time and noticed it uses regex to discern URL components:
sqlalchemy/lib/sqlalchemy/engine/url.py
def _parse_rfc1738_args(name):
pattern = re.compile(
r"""
(?P<name>[\w\+]+)://
(?:
(?P<username>[^:/]*)
(?::(?P<password>.*))?
@)?
(?:
(?:
\[(?P<ipv6host>[^/]+)\] |
(?P<ipv4host>[^/:]+)
)?
(?::(?P<port>[^/]*))?
)?
(?:/(?P<database>.*))?
""",
re.X,
)
...
I tried a simple database URL in regex101 using SQLAlchemy's pattern and it works, although clearly, this is not a comprehensive test:

I was running into the issue with special characters in the URL and came across this issue. This was helpful.
Also it looks like unquote() is now used in the DatabaseURL class, so I believe the intended way to do pass passwords to the Database object would be to do something along the lines of what @yuan1238y suggested.
To quote them, the following now works without monkeypatching:
from urllib.parse import quote
user = quote(user) # e.g. user#001
password = quote(password) # e.g. P@ssw0rd#
url = f"{dialect}://{user}:{password}@{host}:{port}/{db}..."
It would be nice to have this added to the documentation I think.