databases icon indicating copy to clipboard operation
databases copied to clipboard

Passwords containing special characters are not decoded correctly

Open fgimian opened this issue 6 years ago • 17 comments

Hey there guys, our database password contains a # character and databases doesn't seem to like it 😄

In [25]: url = DatabaseURL("postgresql://username:password%23123@localhost/name")

In [26]: url
Out[26]: DatabaseURL('postgresql://username:********@localhost/name')

In [27]: url.password
Out[27]: 'password%23123'

I believe after urlsplit is called, you may also need to unquote each URL component?

If you do not encode the password, urlsplit fails to split things properly:

In [33]: url = DatabaseURL("postgresql://username:password#123@localhost/name")

In [34]: url.username

In [35]: url.password

In [36]: url.hostname
Out[36]: 'username'

In [37]: url.port
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-37-9d4372beb709> in <module>
----> 1 url.port

~/.virtualenv/vsparc-api-fastapi/lib64/python3.6/site-packages/databases/core.py in port(self)
    355     @property
    356     def port(self) -> typing.Optional[int]:
--> 357         return self.components.port
    358
    359     @property

/usr/lib64/python3.6/urllib/parse.py in port(self)
    167         port = self._hostinfo[1]
    168         if port is not None:
--> 169             port = int(port, 10)
    170             if not ( 0 <= port <= 65535):
    171                 raise ValueError("Port out of range 0-65535")

ValueError: invalid literal for int() with base 10: 'password'

This particular Python issue may be of relevance too.

Any help is greatly appreciated Fotis

fgimian avatar Oct 02 '19 04:10 fgimian

I've done lots more digging on this. I'm using databases with a PostgreSQL database (i.e. with the asyncpg backend. The reason this issue gets trickier is that databases passes the dsn directly to the asyncpg library which runs urlparse on it (see this for the related code).

Now the issue is, urlparse essentially results in the same output as urlsplit and also does not unquote the netloc property of the result which includes the username and password. As such, it also always fail to connect when a dsn is supplied with urlencoded username and password.

There seems to be an issue relating to it in the asyncpg repository which I hope to read into further now.

When using asyncpg directly, I would typically not use a dsn and pass individual elements which worked perfectly.

Cheers Fotis

fgimian avatar Oct 02 '19 23:10 fgimian

I've done lots more digging on this. I'm using databases with a PostgreSQL database (i.e. with the asyncpg backend. The reason this issue gets trickier is that databases passes the dsn directly to the asyncpg library which runs urlparse on it (see this for the related code).

I found that you can hack that behaviour the following way, it won't solve your pound symbol in password issue as it's a bug upstream, still you may avoid passing a full DSN

import asyncio

from databases import Database, DatabaseURL


async def main():
    dburl = DatabaseURL('postgresql://')
    database = Database(dburl, host='localhost', port=5438, user="postgres", password="postgres")

    await database.connect()

    query = """SELECT * FROM pg_catalog.pg_tables"""
    r = await database.execute(query=query)
    print(r)


if __name__ == '__main__':
   loop = asyncio.get_event_loop()
   loop.run_until_complete(main())

euri10 avatar Oct 03 '19 07:10 euri10

@euri10 Great trick, thanks heaps! Though I do believe that databases itself also has a little problem in that it doens't unquote parts of the URL (particularly the netloc) after using urlsplit in the DatabaseURL class. There's a pull request in the asyncpg repo to repair the problem in the library, and it's essentially using unquote to ensure everything is as it should be.

fgimian avatar Oct 03 '19 09:10 fgimian

@euri10 @fgimian you can also try url-encoding as asyncpg will decode it back before actual use.

gvbgduh avatar Oct 03 '19 10:10 gvbgduh

@gvbgduh Thanks so much, yeah I think that the fix on asyncpg to do this was only just merged into master. So this should be safe for the next release or if you're installing from master. The latest release version (without the fix) doesn't decode URLs and therefore won't work. See https://github.com/MagicStack/asyncpg/issues/436 for more info 😄

fgimian avatar Oct 03 '19 23:10 fgimian

The only little thing that's a bit inelegant is when interrogating credentials via the databases library, youl'l still get URL encoded ouptut:

e.g.

In [1]: from databases import Database

In [2]: database = Database("postgresql://username:password%23123@localhost/name")

In [3]: database.url.password
Out[3]: 'password%23123'

I'm not quite sure how much work the databases library should do here. But perhaps it should at least decode such components when they are obtained via a DatabaseURL object.

What do you think?

Really appreciate your help! Fotis

fgimian avatar Oct 03 '19 23:10 fgimian

I'm not quite sure how much work the databases library should do here. But perhaps it should at least decode such components when they are obtained via a DatabaseURL object. But perhaps it should at least decode such components when they are obtained via a DatabaseURL object.

That seems correct to me, yeah. We should apply unquoting to all the string-type accessor methods, including url.password .

lovelydinosaur avatar Oct 08 '19 12:10 lovelydinosaur

I've run into this as well, with other characters such as @ in the password. I thought this was a Starlette issue and opened up this issue: https://github.com/encode/starlette/issues/662, but it looks like it's a little deeper.

taybin avatar Oct 08 '19 20:10 taybin

@, : and / are a bit tricky as it might break things down the road with the asyncpg parser, though, it's getting fixed (https://github.com/MagicStack/asyncpg/issues/419). So, unquoting + https://github.com/encode/databases/pull/129 should do the trick.

gvbgduh avatar Oct 09 '19 14:10 gvbgduh

Looks like asyncpg has been updated: https://github.com/MagicStack/asyncpg/commit/89815ea83620f858719a7df77e9192b5026aa8fe

But even after upgrading asyncpg, this issue still shows up here.

cypai avatar Oct 24 '19 01:10 cypai

One thing that might be useful is support for urlencoding in Starlette.Configuration, somewhere. No idea what it'd look like yet. I know that doesn't solve the issue for databases, but that seems like the right level for this sort of fix.

taybin avatar Oct 28 '19 18:10 taybin

I put a proof of concept in encode/starlette#695.

taybin avatar Oct 28 '19 21:10 taybin

The reason for this issue is that password is not urlencoded string. I'm fix this issue using this way: database/core.py:DatabaseURL

from urllib.parse import unquote
...
    @property
    def username(self) -> typing.Optional[str]:
        return unquote(self.components.username)

    @property
    def password(self) -> typing.Optional[str]:
        return unquote(self.components.password)
...

For using:

from urllib.parse import quote
user = quote(user)  # e.g. user#001
password = quote(password)  # e.g. P@ssw0rd#
url = f"{dialect}://{user}:{password}@{host}:{port}/{db}..."

yuan1238y avatar May 12 '20 04:05 yuan1238y

It's possible to "monkey patch" DatabaseURL to allow percent encoding and decoding of specific properties.

import typing
from urllib.parse import unquote
from databases import Database, DatabaseURL

# monkey patch DatabaseURL to allow percent encoded passwords
# https://stackoverflow.com/a/36158137
def _password(self) -> typing.Optional[str]:
    return unquote(self.components.password)

setattr(DatabaseURL, 'password', property(_password))

db = Database(...)

afkjm avatar Jul 08 '20 10:07 afkjm

the cloud sql DSNs mentioned in https://github.com/MagicStack/asyncpg/issues/419 really break the DatabaseURL parser -- because Database refuses to pass a DSN down to the backend without parsing it, these hostnames will always end up null.

Example:

# hostname is null:
>>> DatabaseURL('postgresql:///?host=/unix/socket').hostname
# replace() doesn't help:
>>> DatabaseURL('postgresql:///?host=/unix/socket').replace(hostname='/unix/socket').hostname
>>> 

This is extra-bad because:

  • it triggers a ConnectionError in asyncpg which is hard to debug because there's no host
  • the Database constructor is in module scope (per your docs), so this is stopping servers from coming up at all -- really hard to debug in cloud, blocks most error capture systems from working

Could you pass the DSN down to the backend unmodified? I'm not familiar with this project, so there may be good reasons not to do this, but it would make life easier for parsing edge cases like this one.

abe-winter avatar Nov 01 '20 06:11 abe-winter

Great discussion!

I was curious that I didn't come across this issue before, since my passwords usually do contain special characters. Today, however, is my first brush with Databases. I use SQLAlchemy most of the time and noticed it uses regex to discern URL components:

sqlalchemy/lib/sqlalchemy/engine/url.py

def _parse_rfc1738_args(name):
    pattern = re.compile(
        r"""
            (?P<name>[\w\+]+)://
            (?:
                (?P<username>[^:/]*)
                (?::(?P<password>.*))?
            @)?
            (?:
                (?:
                    \[(?P<ipv6host>[^/]+)\] |
                    (?P<ipv4host>[^/:]+)
                )?
                (?::(?P<port>[^/]*))?
            )?
            (?:/(?P<database>.*))?
            """,
        re.X,
    )
    ...

I tried a simple database URL in regex101 using SQLAlchemy's pattern and it works, although clearly, this is not a comprehensive test:

image

pmsoltani avatar Feb 16 '21 08:02 pmsoltani

I was running into the issue with special characters in the URL and came across this issue. This was helpful.

Also it looks like unquote() is now used in the DatabaseURL class, so I believe the intended way to do pass passwords to the Database object would be to do something along the lines of what @yuan1238y suggested.

To quote them, the following now works without monkeypatching:

from urllib.parse import quote
user = quote(user)  # e.g. user#001
password = quote(password)  # e.g. P@ssw0rd#
url = f"{dialect}://{user}:{password}@{host}:{port}/{db}..."

It would be nice to have this added to the documentation I think.

mryoho avatar Nov 04 '22 15:11 mryoho