iotdb
iotdb copied to clipboard
handling special characters (e.g. '-') in path (V0.13.x) and migration issues
Describe the bug With version v0.13 IOTdb supports arithmetical operations in query. Thus handling names like ..."asdf-asdf" in path is more complicated to handle. We started using IOTdb with version V0.11.x and due to some bugs we migrated data from there via 0.12 to 0.13, this took different tries until i found a more or less good way to do so. The bigest issue was handling of minus-signs in the names which we had in lots of our devices. We experienced that the automatic migration when using data with new versions where the system detects and upgrades this automatically for path with minus-signs in it; it seems it depends on the version of 0.12 to be used in an intermediate step as told by the logs. Furthermore we experienced data loss when migrating data from 0.11.2 to 0.12.1-5 on those with minus-signs e.g. origin dataset had 90k timestamps and after migration of 0.12.1-5 (tested with all 0.12.1, 0.12.2 ... 0.12.5) and data was reduced to ~4k timestamps; when migrating to 0.12.0 data was kept, changing from there to 0.12.1-5 went also fine. But when migrating data to 0.13.x with e.g. minus signs in it all data was lost. For me it seems that migration does not take into account the handling of former valid strings (with minus e.g.) that in the statements it must be handled with single quotes.
tried the migration with the csv import and export as well but when importing columns with '-' in device-name e.g. results in a bunch of exception when trying to import that led to an unsuable VM several times. Helped me by fixing the header column manually by surrounding the device-name with single-quotes.
To Reproduce not easy, but could provide a subset of data if helpful
Expected behavior migration of all data from any source version to a target version should run and keep all data as was also for former valid characters, also maybe if possible without substeps (migration to 0.12.x as intermediate step if possible)
fixed handling of csv-export for stable import
Screenshots could deliver if helpful
Desktop (please complete the following information): OS: debian11, openSuse leap15.4, usage of public container with IOT-DB node in regarding versions stated above
Additional context
Maybe also an improvement regarding that topic:
i experienced issues when submitting insert or select queries (from Python-Interface): using single quotes helped to accept the queries when having e.g. a minus-sign (e.g. 'asdf-asdf'.default.col1), but when using a single-quote around a part of path where there is no minus-sign in it the query fails (e.g. 'asdfasdf'.default.col2) --> it might be good and more consistent when each part of the query that possibly has special characters are in it can be surrounded by single-quotes independent if there are special characters in it. In the usage of my python code i used the following workaround (full path in example is then root.sg.{corrected_iotdb_string}, which represents the device and the field):
splitted = current_iotdb_string.split(".") if "-" in splitted[0]: corrected_iotdb_string = ( f"'{splitted[0]}'.{splitted[1]}.{splitted[2]}" ) else: corrected_iotdb_string = ( f"{splitted[0]}.{splitted[1]}.{splitted[2]}" )
And at the end: thanks of all devs in the team for an excellent work - love to use IOTdb in lots of cases even in productive env.
Hi, thanks for your feedback! About the special characters issue, @lancelly can u PTAL?
Would you like to provide the subset of data? Data loss after upgraded from 0.11 to 0.12 seems abnormal. Would you like to tell us the way u upgrade data from 0.11 to 0.12?
In 0.13, the SQL syntax has been changed. The identifiers not enclosed in backquotes can only contain the following characters, otherwise they need to be enclosed in backquotes.
[0-9 a-z A-Z _ : @ # $ { }] (letters, digits, some special characters)
['\u2E80'..'\u9FFF'] (UNICODE Chinese characters)
Also, In 0.13, if the path node name in the SELECT clause consists of pure numbers, it needs to be enclosed in backquotes to distinguish it from the constant in the expression. For example, in the statement "select 123 + 123
from root.sg", the former 123 represents a constant, and the latter 123
will be spliced with root.sg, indicating the path root.sg.123
.
More details can be found at https://iotdb.apache.org/UserGuide/V0.13.x/Reference/Syntax-Conventions.html