mssql-docker icon indicating copy to clipboard operation
mssql-docker copied to clipboard

BULK with Codepage Parameter on Linux

Open leykevin opened this issue 7 years ago • 39 comments

Hello, i have the following Error in MSSQL Docker image: Keyword or statement option 'CODEPAGE' is not supported on the 'Linux' platform. [16202] (severity 15) [(null)] I have MSSQL installed directly on Debian System and this works but not in the Docker Container. How can i use the Bulk command with the Parameter Codepage. I have a CSV Data in ANSI Latin(1252) and want to import it. The docker have access to the files. If i delete the codepage parameter the import ist successfully but with letters like äöü looks bad.

leykevin avatar Apr 16 '18 07:04 leykevin

Same issue.

smartinsightsfromdata avatar May 30 '18 21:05 smartinsightsfromdata

For some reason, CODEPAGE is not available on mssql for Linux.

The documentation says:

CODEPAGE is not a supported option on Linux.

eltermann avatar Aug 28 '18 19:08 eltermann

We are going to update the documentation to be more precise. CODEPAGE does actually work on SQL Server on Linux but only for RAW and Unicode.

twright-msft avatar Aug 29 '18 15:08 twright-msft

@twright-msft, could you please provide an example of a simple BULK INSERT statement that works on SQL Server on Linux and makes use of CODEPAGE?

Also, this documentation update will be great for clarification.

eltermann avatar Aug 31 '18 12:08 eltermann

https://docs.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-2017#examples See example D. Pasted here for convenience. I changed the example to use CODEPAGE = RAW since that is what works on Linux right now.

BULK INSERT MyTable FROM 'D:\data.csv' WITH ( CODEPAGE = 'RAW', DATAFILETYPE = 'char', FIELDTERMINATOR = ',' );

twright-msft avatar Sep 04 '18 16:09 twright-msft

@twright-msft Unfortunately this example does not seem to work for me - still the same error ('CODEPAGE' is not supported on the 'Linux' platform).

Tried in both Docker & and newest SQL-server installed directly on CentOS.

DennisGlindhart avatar Sep 06 '18 06:09 DennisGlindhart

Can you please run this on your SQL Server on Linux and share the output: SELECT @@VERSION

twright-msft avatar Sep 06 '18 15:09 twright-msft

@twright-msft

Microsoft SQL Server 2017 (RTM-CU10) (KB4342123) - 14.0.3037.1 (X64) Jul 27 2018 09:40:27 Copyright (C) 2017 Microsoft Corporation Developer Edition (64-bit) on Linux (CentOS Linux 7 (Core))

Reproduce:

Install clean CentOS (7.5 / 1804) (followed by yum update && reboot)

$ cat /etc/locale.conf LANG="en_DK.UTF-8" $ curl -o /etc/yum.repos.d/mssql-server.repo https://packages.microsoft.com/config/rhel/7/mssql-server-2017.repo $ curl -o /etc/yum.repos.d/msprod.repo https://packages.microsoft.com/config/rhel/7/prod.repo $ yum install -y mssql-server mssql-tools unixODBC-devel $ /opt/mssql/bin/mssql-conf setup Edition: Developer, Language: English $ /opt/mssql-tools/bin/sqlcmd -s localhost -U sa -P 'xxx' 1> SELECT @@VERSION; 2> GO Microsoft SQL Server 2017 (RTM-CU10) (KB4342123) - 14.0.3037.1 (X64) Jul 27 2018 09:40:27 Copyright (C) 2017 Microsoft Corporation Developer Edition (64-bit) on Linux (CentOS Linux 7 (Core))

(1 rows affected) 1> BULK INSERT MyTable FROM '/srv/test.tbl' WITH ( CODEPAGE = 'RAW', DATAFILETYPE = 'char', FIELDTERMINATOR = ',' ); 2> GO Msg 16202, Level 15, State 3, Server srvfab-sql1, Line 1 Keyword or statement option 'CODEPAGE' is not supported on the 'Linux' platform.

DennisGlindhart avatar Sep 06 '18 16:09 DennisGlindhart

I wonder if this is something that is being caught in sqlcmd. Can you please try from a different tool like SQL Server Management Studio or SQL Operations Studio? We added support for RAW in cumulative update 4.

twright-msft avatar Sep 06 '18 19:09 twright-msft

@twright-msft Same issue from SQL Operations Studio (Version 0.28.6 if that makes a difference). Can it be something with system locale or some related libraries since it (according to @leykevin) works on Debian?

DennisGlindhart avatar Sep 06 '18 19:09 DennisGlindhart

Turns out this was not actually included in a CU as we thought. I'm trying to get clarity on when it will be included. Is bcp an option for you instead of BULK INSERT?

twright-msft avatar Sep 09 '18 00:09 twright-msft

Thank you very much for clarification - that explains it.

BCP is unfortunately not an option in my case, no. I'll just have to wait till CODEPAGE support comes along before continuing with testing - An ETA would be awesome, if possible, yes. Looking forward to it - Thanks.

DennisGlindhart avatar Sep 09 '18 18:09 DennisGlindhart

@twright-msft Seems to be available in 2019 CTP (tested with 2.1) - at least for RAW, but with UNICODE i still get the same error.

Are there any plans for supporting other other codepages?

DennisGlindhart avatar Nov 11 '18 21:11 DennisGlindhart

Glad you got it working with 2.1. We don’t have any plans for other specific code pages right now. Which ones would you like to have?

twright-msft avatar Nov 11 '18 23:11 twright-msft

We are currently using 1252 (Latin) on Windows environment.

We might be able to do with Unicode but it doesn't seem to work on Linux so I haven't tested - only RAW works and that doesn't work due to some date formats.

DennisGlindhart avatar Nov 12 '18 12:11 DennisGlindhart

Do you plan to support CODEPAGE = 65001? Support under UTF - 8 under Linux environment improves convenience.

MasayukiOzawa avatar Mar 08 '19 11:03 MasayukiOzawa

I see SQL on Linux 2019 (in preview) adds code page 65001. Disclaimer: I haven't tried SoL19 yet. This feature should resolve many related issues bulk loading data on Linux. The default locate on RHEL/CentOS is c / utf-8 and most data generation tools like dbgen, dsgen, hammerdb, etc., generate characters in the latin9 code page, which are not recognized by bulk insert on SoL 2017.

I thought about moving my data sets to a Windows server, which supports bulk insert command's CODEPAGE option, and use it to load the data into SQL on Linux, however, my data sets are 4TB. It is a pain to move all of that to a Windows server just to try copying it back (bulk insert command) across a network link.

remi9898 avatar Jun 10 '19 00:06 remi9898

@remi9898 I tried the latest docker image:

Microsoft SQL Server 2019 (CTP3.2) - 15.0.1800.32 (X64)   Jul 17 2019 21:29:33   Copyright (C) 2019 Microsoft Corporation  Developer Edition (64-bit) on Linux (Ubuntu 16.04.6 LTS) <X64>

and had the same error:

Keyword or statement option 'CODEPAGE' is not supported on the 'Linux' platform.

Where did you see they were adding utf-8 ?

guillaume86 avatar Jul 26 '19 11:07 guillaume86

Per Travis' reply earlier, SQL Server 2019 adds support for the CODEPAGE parameter on Linux, but only the value RAW is allowed. In such case, no data conversion is done. This combined with setting DATAFILETYPE=whidechar should do what we want.

Most cases don't require UTF-8 because the extended Latin character set includes the required characters. I found for nearly all of my cases I just needed to create a format file (generated using bcp) and then edit the format file to change every data type to SQLCHAR and set the prefix value to 0. Then, I ran bulk insert using CODEPAGE='RAW'. For example:

A sample format file and bulk insert command:

14.0
8
1       SQLCHAR        0       8       "|"        1     c_custkey         ""
2       SQLCHAR        0       50      "|"        2     c_name            SQL_Latin1_General_CP1_CI_AS
3       SQLCHAR        0       80      "|"        3     c_address         SQL_Latin1_General_CP1_CI_AS
4       SQLCHAR        0       4       "|"        4     c_nationkey       ""
5       SQLCHAR        0       30      "|"        5     c_phone           SQL_Latin1_General_CP1_CI_AS
6       SQLCHAR        0       8       "|"        6     c_acctbal         ""
7       SQLCHAR        0       20      "|"        7     c_mktsegment      SQL_Latin1_General_CP1_CI_AS
8       SQLCHAR        0       236     "|\n"      8     c_comment         SQL_Latin1_General_CP1_CI_AS

bulk insert customer from 'customer.tbl' with (KEEPNULLS,KEEPIDENTITY,CODEPAGE='RAW',formatfile='customer-n.fmt');

Sorry for late reply. I didn't get any notice from this system about this thread having activity until this morning.

remi9898 avatar Jul 26 '19 12:07 remi9898

@remi9898 thanks for the reply, unfortunately it's not practical in my case, I gave up on using the linux version for the moment, will probably use bcp if I need to get back to it.

guillaume86 avatar Jul 26 '19 14:07 guillaume86

Same issue here. I have a utf-8 file containing french characters (É, à, œ, etc.) and I can't import it in SQL Server using bulk insert. I tried to use "recode" tool to change the file encoding, but I can't find which one could work... utf-16 or unicode doesn't fix the problem.

SylvainDevidal avatar Feb 17 '20 16:02 SylvainDevidal

So is it impossible to import a simple CSV file with special characters to SQL SEVER linux using BULK IMPORT? is there any alternative @twright-msft to bypass this limitation?

alouane avatar Jul 15 '20 12:07 alouane

@SylvainDevidal did you manage to bypass the issue?

alouane avatar Jul 15 '20 17:07 alouane

I found this and it works for me: Feedback azure

The only thing I did was to change the file coding from UTF-8 to UTF-16, my query was this:

BULK INSERT dbo.table_name FROM '/Import/source_file.csv' WITH (FIELDTERMINATOR=',', FIELDQUOTE = '"', KEEPIDENTITY, KEEPNULLS);

Hope this works for you.

aberlin75 avatar Apr 07 '21 05:04 aberlin75

no luck in GCP can any one help me

kr185187 avatar Nov 25 '21 12:11 kr185187

I tried like above statement getting Error BULK INSERT dbo.table_name FROM '/Import/source_file.csv' WITH (FIELDTERMINATOR=',', FIELDQUOTE = '"', KEEPIDENTITY, KEEPNULLS); "You do not have permission to use the bulk load statement."

kr185187 avatar Nov 25 '21 12:11 kr185187

I was trying to get some German and Hungarian characters rendered and CODEPAGE = 'RAW' did it for me

autocorrectoff avatar Sep 19 '22 07:09 autocorrectoff

It might be worth considering using a scripted language/tool such as Python or JavaScript(Node) for the import/export. It may not be quite as optimized, but may be more reliable and able to handle edge cases better. Mace sure to use an appropriately configured CSV library that handles line endings inside quoted values.

I've used the mssql/tedious drivers in Node without much issue... don't know what the equivalent is for Python though.

tracker1 avatar Sep 19 '22 18:09 tracker1

Five years later. Is there a solution for Cyrillic characters? The above methods do not work.

7702244 avatar Jun 04 '23 12:06 7702244

Same issue here. I have a utf-8 file containing french characters (É, à, œ, etc.) and I can't import it in SQL Server using bulk insert. I tried to use "recode" tool to change the file encoding, but I can't find which one could work... utf-16 or unicode doesn't fix the problem.

it work!

a1404040 avatar Jun 08 '23 06:06 a1404040