fdb decodes message with a system encoding while it`s encoded using server encoding
How to reproduce:
- Run
Firebird 2.0underWindows 10(default charset isCP1251) - Run
python3underLinux(default charset isUTF-8) withfdb==2.0.2 - Run procedure that returns as an exception some cyrillic symbols
- See
'utf-8' codec can't decode byte 0xf2 in position 0: invalid continuation byteerror
Stacktrace points into a fbcore.py:607
Probably, the solution can be to use a charset option from the connect method here but have no idea how to do this
I ran into the same problem, having WIN1250 charset when connecting to the database. I solved it by creating a global variable and overwriting it in the connect method. Having a global variable I used it in the exception_from_status method.
def exception_from_status(error, status, preamble=None):
.......
if PYTHON_MAJOR_VER == 3:
msglist.append('- ' + (msg.value).decode(GLOBAL_VAR_NAME))
I don't know if this is the best solution, but it works.
I went though a similar bug, I've solved it adding the "replace" option to the decode function
Apologies for bumping an old issue.
We've also had to deal with this problem in 2024.
We have both a Python backend and a Firebird 4 DB running on Linux. The database is encoded using cp1251/WIN1251 for legacy reasons, while the backend speaks UTF-8. All queries with text in WIN1251 are converted to UTF-8 without problems, since we've set the encoding for the database when creating the connection. However, any exceptions containing cyrillic characters raise decoding errors in Python.
We've held off on changing over to the new Python driver due to an issue with how BLOBs are handled and how that relates to the SQLAlchemy driver for Firebird.
I admit that we haven't tested whether this is actually the case, but having a look at the new driver's source code, it seems to also suffer from this issue, since it uses locale.getpreferredencoding() to determine how exceptions should be decoded.
The proposed solutions have some problems:
- adding
errors=replaceto.decode()risks losing the information contained within the exception - setting a global variable for the encoding doesn't work if you connect to databases with different encodings
The solution we've found works best for our case is to use the same encoding as the connection to the database, since it's more likely that the database will also use that encoding for its exceptions.
This means that, in fdb/fbcore.py, we have to:
- add a new parameter to
exception_from_status:encoding, and using it to decode the exception
591c591
< def exception_from_status(error, status, preamble=None):
---
> def exception_from_status(error, status, preamble=None, encoding=None):
607c607
< msglist.append('- ' + (msg.value).decode(sys_encoding))
---
> msglist.append("- " + (msg.value).decode(charset_map.get(encoding, encoding) or sys_encoding))
- find all the places where
exception_from_statusis called and provide a value for the new parameter
We do have a patch file for fixing this issue, which can be applied to fdb/fbcore.py. However I'm reluctant to turn it into a pull request, since we don't have any tests we can provide, and we aren't sure we found every place where this issue occurs.
Well, the core of this problem is that there could be error messages that are encoded in OS encoding at the server (path, filenames etc.). In your case it happens to be the same as database encoding, so your solutions works fine for you, but fails for other cases. Hence I'm reluctant to adopt this approach. I agree that this should be configurable, best at connection level (both database and server). I'll see what I can do about that, but I'll first fix that in firebird-driver as it's more easy with its separate configuration scheme. I'll see if something could be done with FDB.