pgloader icon indicating copy to clipboard operation
pgloader copied to clipboard

Arabic Text Converted to Question Marks During SQL Server to PostgreSQL Migration

Open MuhammadJamal99 opened this issue 7 months ago • 1 comments

Issue Description When migrating data from SQL Server to PostgreSQL using pgloader, Arabic text stored in NVARCHAR and VARCHAR columns is being converted to question marks (???) instead of preserving the original Arabic characters. Expected Behavior Arabic text should be migrated correctly and displayed as proper Arabic characters in PostgreSQL. Actual Behavior

Arabic text appears as question marks (???) in PostgreSQL after migration Hex analysis shows the data is stored as literal question mark characters (3f3f3f3f3f) New UTF-8 Arabic text inserted directly into PostgreSQL works correctly

Source Data Details SQL Server Configuration:

Arabic text is stored in NVARCHAR columns (UTF-16 encoding) Some columns may use VARCHAR with Arabic Windows collation (Windows-1256) Data displays correctly in SQL Server Management Studio Sample SQL Server Data: sql-- Example of data that should migrate SELECT StatusNameSL FROM ClaStatus WHERE StatusNameSL LIKE N'%Arabic_Text%' -- Shows proper Arabic characters in SSMS Migration Configuration Attempted Configuration 1: Basic Setup lispLOAD DATABASE FROM mssql://user:password@server/source_database INTO postgresql://user:password@localhost/target_database

WITH include drop, create tables, create indexes, reset sequences

CAST type nvarchar to text, type varchar to text

ALTER SCHEMA 'dbo' RENAME TO 'public'; Configuration 2: With UTF-8 Decoding lispLOAD DATABASE FROM mssql://user:password@server/source_database INTO postgresql://user:password@localhost/target_database

WITH include drop, create tables, create indexes, reset sequences, decoding-as utf8

CAST type nvarchar to text, type varchar to text

ALTER SCHEMA 'dbo' RENAME TO 'public'; Configuration 3: With Encoding Specifications lispLOAD DATABASE FROM mssql://user:password@server/source_database?charset=utf8 INTO postgresql://user:password@localhost/target_database

WITH include drop, create tables, create indexes, reset sequences

SET client_encoding to 'UTF8'

CAST type nvarchar to text using utf8-decode, type varchar to text using utf8-decode

BEFORE LOAD DO $$ SET client_encoding = 'UTF8'; $$; Command Line Attempts bashpgloader --encoding utf8 config.load pgloader --with "decoding-as utf8" config.load Investigation Results PostgreSQL Analysis After migration, analyzing the data shows: sql-- Check encoding SHOW server_encoding; -- UTF8 SHOW client_encoding; -- UTF8

-- Hex analysis of migrated data SELECT encode(column_name::bytea, 'hex') FROM table_name; -- Result: 3f3f3f3f3f (question marks in hex)

-- Test with manually inserted Arabic text INSERT INTO table_name (column_name) VALUES ('النص العربي'); -- This works correctly and shows proper hex values like: d8a7d984d984d987 Additional Context

The target PostgreSQL database is configured with UTF-8 encoding Manual insertion of Arabic text into PostgreSQL works perfectly The issue appears to be during the data extraction/conversion phase from SQL Server Other non-ASCII characters (if any) may also be affected Possible Root Cause The issue seems to occur when pgloader reads NVARCHAR data from SQL Server. The UTF-16 encoded Arabic text might not be properly converted to UTF-8 during the migration process, resulting in the database receiving question mark characters instead of the original Arabic text. Request Could you please:

Confirm if this is a known issue with MSSQL migrations Provide guidance on proper configuration for Arabic/Unicode text migration Consider adding better Unicode support documentation for MSSQL migrations

Sample Data for Testing If needed, I can provide a minimal SQL Server database with Arabic text samples for testing purposes.

Additional Information:

Willing to test proposed solutions Can provide more detailed logs if needed Available to help with debugging this encoding issue

MuhammadJamal99 avatar Jun 12 '25 14:06 MuhammadJamal99

@dimitri

MuhammadJamal99 avatar Jun 13 '25 21:06 MuhammadJamal99