pyiron_base icon indicating copy to clipboard operation
pyiron_base copied to clipboard

underscores in project name kills database performance

Open XzzX opened this issue 10 months ago • 8 comments

pyiron=# SELECT * FROM jobs_cmmc WHERE jobs_cmmc.project LIKE 'test%';
Time: 0.481 ms

pyiron=# SELECT * FROM jobs_cmmc WHERE jobs_cmmc.project LIKE 'te_st%';
Time: 394.972 ms

pyiron=# SELECT * FROM jobs_cmmc WHERE jobs_cmmc.project LIKE 'te\_st%';
Time: 0.607 ms

_ stands for any single character in LIKE statements. https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-LIKE
To counter this we need to escape any unwanted % and _ characters.

XzzX avatar Jan 09 '25 14:01 XzzX

Hi @XzzX,

I looked into this and noticed that there are two possible options to change this behavior, either: https://github.com/pyiron/pyiron_base/blob/main/pyiron_base/database/generic.py#L337 or: https://github.com/pyiron/pyiron_base/blob/main/pyiron_base/database/generic.py#L956

But in both cases it fails in our test suits as we are using SQLite. Maybe you can still give it a try on PostgreSQL.

Best,

Jan

jan-janssen avatar Jan 13 '25 16:01 jan-janssen

If it breaks SQLite we should guard it with the TYPE configuration variable? Or do you want to drop support for SQLite?

XzzX avatar Jan 13 '25 16:01 XzzX

@pmrv is the better person to test it. I just saw this kind of queries in the log for long running queries. But I do not know where the originate from.

XzzX avatar Jan 13 '25 16:01 XzzX

I'd rather not remove sqlite capability. I briefly googled this when @XzzX opened the issue and apparently the best solution would be to use the formatting support of sqlalchemy rather than doing the escaping manually, but that would require us to rewrite our SQL calls to be of the form

conn.execute("SELECT foo FROM bar where project = :name", name=...)

rather than

conn.execute("SELECT foo FROM bar where project = %s" % name)

that we currently do, but it wasn't straightforward in the ten minutes I gave it.

pmrv avatar Jan 13 '25 16:01 pmrv

I'd rather not remove sqlite capability. I briefly googled this when @XzzX opened the issue and apparently the best solution would be to use the formatting support of sqlalchemy rather than doing the escaping manually, but that would require us to rewrite our SQL calls to be of the form

conn.execute("SELECT foo FROM bar where project = :name", name=...)

rather than

conn.execute("SELECT foo FROM bar where project = %s" % name)

that we currently do, but it wasn't straightforward in the ten minutes I gave it.

Alone for security reasons we should do this. Due to RLS not as bad but this is prone to SQL injection.

XzzX avatar Jan 14 '25 08:01 XzzX

Turns out I was wrong, we do construct the queries properly in get_items_dict, but not in an older method that somehow was still around. I've removed it here.

pmrv avatar Jan 15 '25 13:01 pmrv

Are you already using the new version? I still see this kind of slow queries in the database log.

XzzX avatar Feb 06 '25 08:02 XzzX

@XzzX The changes were included in 0.11.0 while the cluster is still running 0.10.10, so we have to wait for an update of the cluster environment. @niklassiemer Can you take a look at this?

jan-janssen avatar Feb 06 '25 08:02 jan-janssen