tds_fdw
tds_fdw copied to clipboard
ERROR: Unconverted bytes were changed to question marks
Hello,
When I try to use Postgres 9.6 tds_fdw using UTF-8 character set... it's throwing error for a couple of tables...
test=# select * from test_fdw.labels; NOTICE: DB-Library notice: Msg #: 5701, Msg state: 2, Msg: Changed database context to 'test'., Server: WIN2K12-DEV, Process: , Line: 1, Level: 0 NOTICE: DB-Library notice: Msg #: 5703, Msg state: 1, Msg: Changed language setting to us_english., Server: WIN2K12-DEV, Process: , Line: 1, Level: 0 ERROR: DB-Library error: DB #: 2403, DB Msg: Some character(s) could not be converted into client's character set. Unconverted bytes were changed to question marks ('?'), OS #: 0, OS Msg: Success, Level: 4
I verified all the things listed in https://github.com/tds-fdw/tds_fdw/issues/1
Can you help ?
Can you post the table definition please? What have you tried from issue #1?
I have also encountered this.
I have also encountered this.
Interestingly, the issue seems to have resolved when I locally built FreeTDS off of the current master, then built tds-fdw again. OSX Mojave, FreeTDS originally from brew.
Interestingly, the issue seems to have resolved when I locally built FreeTDS off of the current master, then built tds-fdw again. OSX Mojave, FreeTDS originally from brew.
What was the FreeTDS version you downloaded from brew?
I have this issue on CentOS 7 with freetds version 0.95.81 from EPEL and postgresql-10-tds_fdw.x86_64 ver 2.0.0-alpha.3.2.el7 (from repo: https://tds-fdw.github.io/yum/). SQL Server 2008R2 has database with Collation Serbian_Latin_100 which seams to be CP1250. In /etc/freetds.conf I have only:
tds version = 7.3 client charset = UTF-8
I created 2 databases in PostgreSQL:
First "izp" with Encoding:"UTF8" (SHOW client_encoding shows UNICODE), and Character type and Collation:"en_US.UTF-8". For varchar(250) string "Vojvođanska banka" PostgreSQL with shows "Vojvoрanska banka a.d." (using PgAdmin4)
Second "izpmssql" with Encoding:"WIN1250" ( SHOW client_encoding shows WIN1250), and Character type and Collation:"C". For varchar(250) string "Vojvođanska banka" PostgreSQL with shows "VojvoŃ€anska banka a.d.
If I change to client charset = CP1250
both databases throw error #2403
I have also seen that FreeTDS is now version 1.1 (https://fossies.org/diffs/freetds/1.1_vs_1.1.1/ChangeLog-diff.html) and it shows that there was a fix: "Fix some collation encoding detection"
So my question is if versions I use should be able to show data correctly, or should I try to get my hands on newer versions of freetds and/or tds_fdw?
OK, It turns out everything is good. I used first database with UTF8 encoding and I dropped then recreated all foreign tables:
IMPORT FOREIGN SCHEMA izp
EXCEPT (racun, tipKomitenta)
FROM SERVER izpwin2012
INTO mssql
OPTIONS (import_default 'true');
IMPORT FOREIGN SCHEMA izp
LIMIT to (racun, tipKomitenta)
FROM SERVER izpwin2012
INTO mssql
OPTIONS (import_default 'false');
IMPORT FOREIGN SCHEMA sigurnost
FROM SERVER izpwin2012
INTO mssql
OPTIONS (import_default 'true');
It turns out only PgAdmin4 does not see those characters, but Lazarus with ZEOS components shows them properly. So I consider this non-issue.
Sounds like you got it, can you close this issue? @DrLove73
Yes, you can.
I was wrong, this is only a partialy good. letter "đ" is shown properly, but "č" and "ž" are not shown properly :-( I will check other special letters from Serbian tomorrow.
Ok, I conducted a thorough testing of every imaginable variant of data component in Lazarus, but I get same data as from PgAdmin4. I got codes from MSSQL using following SQL:
SET TEXTSIZE 0;
-- Create variables for the character string and for the current
-- position in the string.
DECLARE @position int, @string char(8);
-- Initialize the current position and the string variables.
SET @position = 1;
SET @string = 'ŠĐŽČĆšđžčć';
WHILE @position <= DATALENGTH(@string)
BEGIN
SELECT ASCII(SUBSTRING(@string, @position, 1)) as ASCII, NCHAR(UNICODE(SUBSTRING(@string, @position, 1))) as Letter,
UNICODE(SUBSTRING(@string, @position, 1)) as Unicode
SET @position = @position + 1
END;
GO
Shown by MSSQL (2008R2) with "Serbian Latin"->FreeTDS (0.95.81 from EPEL)->TDS_FDW (2.0.0-alpha.3.2.el7)->PgAdmin4: Shown = Should be = MSSQL ASCII() = MSSQL Unicode():
Љ=Š=138=352
Р=Đ=208=272
Ћ=Ž=142=381
И=Č=200=268
Ж=Ć=198=262
љ=š=154=353
р=đ=240=273
ћ=ž=158=382
и=č=232=269
ж=ć=230=263
It seams that is FreeTDS problem since if I use (on Windows) dblib.dll (freetds v1.00.11) via Lazarus MSSQLConn, same characters are shown. If I try to use (on Windows) any newer sybdb.dll (FreeTDS 1.2.x) as dblib.dll that MSSQLConn demands, I get error:
The USE database statement failed because the database collation Serbian_Latin_100_CI_AI is not recognized by older client drivers. Try upgrading the client operating system or applying a service update to the database client software, or use a different collation. See SQL Server Books Online for more information on changing collations.
Any help or direction would be helpfull. Is there any way tds_fdw could convert them on the fly or will I have to add code to my app to avoid this veritable horror I have in attempt to move away from old implementation?
Can you try updating FreeTDS? In my experience FreeTDS is very picky.
I updated last part of my previous comment to reflect I used FreeTDS 1.2.x (last build) on Windows (dll from AppVeyor https://ci.appveyor.com/project/FreeTDS/freetds/builds/25496715/job/398k4ka0iesqxc1k/artifacts), I get that "The USE database" error.
I will have to see how to compile rpm for EL7 with latest version to install it on my devel laptop.
Ok, IT WORKS! I fired up my rpm rebuilding skills and I managed to update EPEL's src.rpm to version 1.1.6 (replaced tar.bz2, and in spec file changed version number and --with-tdsver="7.3" from --with-tdsver="4.2" ). Since there are no rpm's for CentOS 7, I am publishing rpm and src.rpm files in my old repo: http://centos.plnet.rs/mrepo/plnet-centos7-x86_64/RPMS.plnet-compiled/ and http://centos.plnet.rs/mrepo/plnet-centos7-srpms-srpms/RPMS.plnet-compiled/ BE aware that files are NOT YET SIGNED! I have yet to find my old keys and even see if they are still valid. I will do this at later time. Serbian letters are properly shown in PgAdmin4 and Lazarus via ZEOS components.
Tds_fdw developers: If providing link to binary packages is against some rules, you may remove them, or I will if you say so. I am long-time CentOS contributor (in mailing lists since ~2009) and I am admin of CentOS Facebook group so rpm's are clean of malware, built from unchanged sources.