datajoint-python icon indicating copy to clipboard operation
datajoint-python copied to clipboard

0.13.x regression error: backslash in string

Open simon-ball opened this issue 2 years ago • 0 comments

Bug Report

Description

Handling of backslashes inside strings has changed as of 0.13.3 (ans possibly .1 and .2, I have not tested them).

Specifically, matching a single backslash inside a string in a table requires 3 backslashes in a string in Python. This mismatch also means that same string (containing one backslash character) can't be used to restrict the table and find itself again.

Some conversion factor or escaping has been addded that is non-reversible.

Consider a table which contains a varchar column:

# acquisition.py
@schema
class ClusterSessionGroup(dj.Manual):
    definition = """
    group_name:         varchar(64)
    ---
    group_desc='':      varchar(128)    
    """
    contents = [("27285_Elm\j_d2_sequence", "full day of sequence task"),]

A user has assigned a group name of "27285_Elm\j_d2_sequence", note the central backslash. Backslash is the standard escape character in Python, and when printing out, is escaped by itself, i.e.

gn = "27285_Elm\j_d2_sequence"
gn
>>> '27285_Elm\\j_d2_sequence'

Expected: 0.12.x behaviour

With datajoint 0.12.7, I can restrict, fetch, and re-restrict with the original python string, like so:

import datajoint as dj
dj.__version__
>>> '0.12.7'
tbl = acquisition.ClusterSessionGroup & {"group_name":"27285_Elm\j_d2_sequence"}  # single backslash
tbl
>>> total: 1
# output displayed as in-line image below
# 1 total result shown as expected

image

If I fetch the row from the table, I get the escaped version, i.e. two backslashes. This is expected Python behaviour for representing that the backslash is a character and not a special character.

key = tbl.fetch("KEY")
key
>>> [{'group_name': '27285_Elm\\j_d2_sequence'}]  # double backslash, expected Python behaviour

Moreover, the conversion between sql and pythoin (if any is happening at all, is reversible: I can fetch the row, and re-restrict the table by it, and get the row back again.

tbl & key
>>> total: 1
# 1 total result shown, same as above in-line image

Bug: regressed 0.13.x behaviour

With Datajoint 0.13.3, the table cannot be restricted with the "expected" Python string, i.e. with a single backslash, as that returns zero hits, instead of the expected 1 hit:

import datajoint as dj
dj.__version__
>>> '0.13.3'
tbl = acquisition.ClusterSessionGroup & {"group_name":"27285_Elm\j_d2_sequence"}  # single backslash
tbl
>>> total: 0

Instead, it turns out that we need 3 backslashes. Not 2 (i.e. pre-escaped) or 4 (i.e. double-escaped).

tbl = acquisition.ClusterSessionGroup & {"group_name":"27285_Elm\\\j_d2_sequence"} # triple backslash!
tbl
>>> total: 1

Clearly, whatever back-end conversion DJ is doing to convert to/from the correct text format for the database to accept has changed. However, it gets worse than that. This conversion is not reversible. Consider what happens if we use that triple-backslashed string to restrict the table and fetch the row, and then restrict the table by that row again

k = tbl.fetch("KEY")
k
>>> [{'group_name': '27285_Elm\\j_d2_sequence'}]  # double backslash!
tbl & k
>>> total: 0

The contents of the row is no longer a valid match to the contents of the row. This breaks anything that relies on matching a row back to itself in a table (or parent table, e.g. in the make() function.

Reproducibility

  • Tested on Ubuntu 18.04 and Windows 10 21H2 with Python 3.8.0
  • Target database is MySQL 5.7 running on baremetal remote host
  • Datajoint versions in use: 0.12.7 and 0.13.3

Additional Research and Context

5 is right out

simon-ball avatar Mar 03 '22 15:03 simon-ball