Many errors are reported when parsing the dbc with Chinese characters and special characters
When I use canmatrix to load DBC with signals containing Chinese characters and special characters, like:
_matrix = canmatrix.formats.dbc.load(f, dbcImportEncoding=encoding)
errors reported like this:
error with line no: 2004
b' SG_ PSDCU_RR\xe4\xb8\xbb\xe8\xbd\xaf\xe4\xbb\xb6\xe7\x89\x88\xe6\x9c\xac\xe5\x8f\xb7$_W : 63|8@0+(1,0)[0|255] "" Vector__XXX\r\n'
the original line like this:
SG_ 冗余制动降级状态$_W : 23|3@0+(1,0)[0|7] "" Vector__XXX
then I find canmatrix use regex to match each line in the dbc, it uses the following regex when processing lines starting with'SG_':
pattern = r"^SG_ +(\w+) *: *(\d+)\|(\d+)@(\d+)([\+|\-]) *\(([0-9.+\-eE]+), *([0-9.+\-eE]+)\) *\[([0-9.+\-eE]+)\|([0-9.+\-eE]+)\] +\"(.*)\" +(.*)"
regex group (\w+) cannot match Chinese characters or special characters in python3.8, so I suggest to change the regex above into:
pattern = r"^SG_ +(\S+) *: *(\d+)\|(\d+)@(\d+)([\+|\-]) *\(([0-9.+\-eE]+), *([0-9.+\-eE]+)\) *\[([0-9.+\-eE]+)\|([0-9.+\-eE]+)\] +\"(.*)\" +(.*)"
To adapt to the scenarios mentioned in the issue.
Please reply, it's very important to me!
Hi @Liluoquan
you have to specify the encoding "dbcImportEncoding".
maybe something like dbcImportEncoding="utf8"
Hi @Liluoquan
any success?
Hi @ebroecker
sorry, it didn't work when i use utf-8, GB2312 or gbk:
_matrix = canmatrix.formats.dbc.load(f, dbcImportEncoding='utf-8')
The error is as follows:
error with line no: 28
b' SG_ \xca\xfd\xd7\xd6\xd6\xa4\xca\xe9\xb4\xe6\xb4\xa2\xb9\xca\xd5\xcf$_W : 20|1@0+(1,0)[0|1] "" Vector__XXX\r\n'
Hi @Liluoquan
I did not read your issue completely the fist time - sorry.
You already provided a potential fix. Thanks for it! I'll add your provided fix soon.