yuniql icon indicating copy to clipboard operation
yuniql copied to clipboard

treatment of non-ascii characters

Open avengerovsky opened this issue 3 years ago • 6 comments

It looks like yuniql replaces non-ascii characters in my script with '�'. Example: select regexp_replace(asset_name, '[–—]', '-', 'g') as name from xxx becomes select regexp_replace(asset_name, '[��]', '-', 'g') as name from xxx

The same script executed with psql or dbeaver produces correct representation of the characters.

Thoughts?

avengerovsky avatar Feb 14 '22 17:02 avengerovsky

Hi @avengerovsky , thanks for reaching out and you're probably right about this. I will add some test to cover this and fix will possibly be in the next release. Im really hoping to release this week if time permits.

P.S. ICYMI, please Star our repo. Thanks!

rdagumampan avatar Feb 14 '22 17:02 rdagumampan

@avengerovsky , I tried to reproduce this and I think that while it prints incorrectly on console, it reads the content of the script files correctly. Here I have an script file with this script and I can see the Chinese characters are well preserved when its inserted into database.

There is problem in Console and I tried looking and it seems to have somethign to do with the Console settings than the code itself.

SELECT '苹果 (Píngguǒ)' AS Apple UNION ALL
SELECT '微软 (Wēiruǎn)' AS Microsoft UNION ALL
SELECT '三星 (Sānxīng)' AS Samsung
GO

CREATE TABLE [dbo].[test_utf16_table](
[textdata] [nvarchar](MAX) NOT NULL
);
GO

INSERT INTO [dbo].[test_utf16_table] VALUES (N'苹果 (Píngguǒ)')
INSERT INTO [dbo].[test_utf16_table] VALUES (N'微软 (Wēiruǎn)')
INSERT INTO [dbo].[test_utf16_table] VALUES (N'三星 (Sānxīng)')
GO


image

rdagumampan avatar Feb 24 '22 04:02 rdagumampan

@avengerovsky , I tried to reproduce this and I think that while it prints incorrectly on console, it reads the content of the script files correctly. Here I have an script file with this script and I can see the Chinese characters are well preserved when its inserted into database.

There is problem in Console and I tried looking and it seems to have somethign to do with the Console settings than the code itself.

SELECT '苹果 (Píngguǒ)' AS Apple UNION ALL
SELECT '微软 (Wēiruǎn)' AS Microsoft UNION ALL
SELECT '三星 (Sānxīng)' AS Samsung
GO

CREATE TABLE [dbo].[test_utf16_table](
[textdata] [nvarchar](MAX) NOT NULL
);
GO

INSERT INTO [dbo].[test_utf16_table] VALUES (N'苹果 (Píngguǒ)')
INSERT INTO [dbo].[test_utf16_table] VALUES (N'微软 (Wēiruǎn)')
INSERT INTO [dbo].[test_utf16_table] VALUES (N'三星 (Sānxīng)')
GO


image

rdagumampan avatar Feb 24 '22 04:02 rdagumampan

The log files also capture the text correctly. Can you send me a test script file I can use to reproduce the issue. Thanks. @avengerovsky image

rdagumampan avatar Feb 24 '22 05:02 rdagumampan

Thank you for looking into this issue. Here is how my log looks like:

image

the original sql is:

image

Now, I found a workaround by using doublebyte digital representation of these characters (en dash and em dash) and it is working fine.

image

Still I think the problem is there...

avengerovsky avatar Feb 24 '22 05:02 avengerovsky

Thanks, I will try to reproduce this on pgsql. The one I tested so far is on sqlserver. And good to hear you found a work around and still convinced to use yuniql :)

It should ideally work with pgdump and with out customization as you did. We'll try to do more investigation. Thanks again.

rdagumampan avatar Feb 24 '22 05:02 rdagumampan