pyiron_base
pyiron_base copied to clipboard
underscores in project name kills database performance
pyiron=# SELECT * FROM jobs_cmmc WHERE jobs_cmmc.project LIKE 'test%';
Time: 0.481 ms
pyiron=# SELECT * FROM jobs_cmmc WHERE jobs_cmmc.project LIKE 'te_st%';
Time: 394.972 ms
pyiron=# SELECT * FROM jobs_cmmc WHERE jobs_cmmc.project LIKE 'te\_st%';
Time: 0.607 ms
_ stands for any single character in LIKE statements. https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-LIKE
To counter this we need to escape any unwanted % and _ characters.
Hi @XzzX,
I looked into this and noticed that there are two possible options to change this behavior, either: https://github.com/pyiron/pyiron_base/blob/main/pyiron_base/database/generic.py#L337 or: https://github.com/pyiron/pyiron_base/blob/main/pyiron_base/database/generic.py#L956
But in both cases it fails in our test suits as we are using SQLite. Maybe you can still give it a try on PostgreSQL.
Best,
Jan
If it breaks SQLite we should guard it with the TYPE configuration variable? Or do you want to drop support for SQLite?
@pmrv is the better person to test it. I just saw this kind of queries in the log for long running queries. But I do not know where the originate from.
I'd rather not remove sqlite capability. I briefly googled this when @XzzX opened the issue and apparently the best solution would be to use the formatting support of sqlalchemy rather than doing the escaping manually, but that would require us to rewrite our SQL calls to be of the form
conn.execute("SELECT foo FROM bar where project = :name", name=...)
rather than
conn.execute("SELECT foo FROM bar where project = %s" % name)
that we currently do, but it wasn't straightforward in the ten minutes I gave it.
I'd rather not remove sqlite capability. I briefly googled this when @XzzX opened the issue and apparently the best solution would be to use the formatting support of sqlalchemy rather than doing the escaping manually, but that would require us to rewrite our SQL calls to be of the form
conn.execute("SELECT foo FROM bar where project = :name", name=...)rather than
conn.execute("SELECT foo FROM bar where project = %s" % name)that we currently do, but it wasn't straightforward in the ten minutes I gave it.
Alone for security reasons we should do this. Due to RLS not as bad but this is prone to SQL injection.
Turns out I was wrong, we do construct the queries properly in get_items_dict, but not in an older method that somehow was still around. I've removed it here.
Are you already using the new version? I still see this kind of slow queries in the database log.
@XzzX The changes were included in 0.11.0 while the cluster is still running 0.10.10, so we have to wait for an update of the cluster environment. @niklassiemer Can you take a look at this?