python-irodsclient `irods.test.query_test.TestQuery.test_files_query_case_sensitive`'s `between` test fails for mysql

This was first discovered in https://github.com/irods/python-irodsclient/issues/597. See also: https://github.com/irods/python-irodsclient/issues/597#issuecomment-2263583927 and https://github.com/irods/irods/issues/7930

We need to be able to detect the database type being used in the iRODS server and change the assertions based on that.

Aug 01 '24 18:08 alanking

This sub-test(ish) is now being skipped as of PR #601. A TODO linked to this issue has been left next to the code which needs to be fixed:

https://github.com/irods/python-irodsclient/blob/7af55af1f2ca8618ecf22049d4d2b26df2dcf9a2/irods/test/query_test.py#L181-L184

Aug 01 '24 20:08 alanking

Re: https://github.com/irods/python-irodsclient/blob/7af55af1f2ca8618ecf22049d4d2b26df2dcf9a2/irods/test/query_test.py#L181-L184

Are we better able to detect the DB flavor now?
What is the specific issue in the BETWEEN operator when using MySQL? What makes its behavior different to other DB's?

Apr 30 '25 14:04 d-w-moore

Are we better able to detect the DB flavor now?

No. That isn't something clients should care about. The test and/or test harness needs to provide that information. Alternatively, you can try and adjust the assertions so that the test passes regardless of the database.

What is the specific issue in the BETWEEN operator when using MySQL? What makes its behavior different to other DB's?

That is best answered by running an experiment and/or reading the docs.

https://dev.mysql.com/doc/refman/8.4/en/comparison-operators.html#operator_between
https://www.postgresql.org/docs/current/functions-comparison.html

Use of the BETWEEN keyword is optional. BETWEEN can likely be replaced by explicit range checks, leading to more consistent results across database technologies. However, there may be other factors at play. For that reason, reading the docs, examining the values used in the test, running experiments, and considering the database's configuration will help in figuring out the difference.

Apr 30 '25 16:04 korydraughn

Thanks @korydraughn . I did do a little experimentation. MySQL seems to fuzz the case by default, where as eg PostgreSQL does not.

Apr 30 '25 17:04 d-w-moore

It's not clear to me what you mean by "fuzz". Please explain.

Apr 30 '25 17:04 korydraughn

That was inaccurate of me. By "fuzz" I really just meant the MySQL BETWEEN is doing character comparisons/sorting case-insensitively regardless of what case we throw at it with our object names. So MySQL query results might be influenced by its differently-sorted inputs to the BETWEEN operator according to: 'a' < 'B' < 'c' rather than, for example: 'B' < 'a' < 'c' as you might expect from the characters' ASCII codes.

I understand from past experience there are locales in which case insensitive ordering is enforced. Our mysql DB might, in a default installation, have such a locale set for itself; therefore its BETWEEN operator might operate on top of that altered (relative to the other DBs) sort order, rendering results differently.

I kinda repeated myself there, so pardonnez-moi si vous plait. :)

May 12 '25 16:05 d-w-moore

As I stated during previous group discussions, I may end up moving the failing parts of the mysql test to a separate test that only runs when the client unittest is run on the same node as the server and DB. (Such a test harness exists in branch https://github.com/d-w-moore/python-irodsclient/tree/502.m

May 12 '25 16:05 d-w-moore

Makes sense.

That branch targets #502 which is part of the 3.2.0 milestone. Sounds like this issue should be moved into that milestone as well. You agree?

May 12 '25 17:05 korydraughn

i don't think that hurts. I may still try to determine now if my guesses are correct on this one, since I'm most of the way there.

May 13 '25 02:05 d-w-moore