mssql-docker
mssql-docker copied to clipboard
BULK with Codepage Parameter on Linux
Hello, i have the following Error in MSSQL Docker image: Keyword or statement option 'CODEPAGE' is not supported on the 'Linux' platform. [16202] (severity 15) [(null)] I have MSSQL installed directly on Debian System and this works but not in the Docker Container. How can i use the Bulk command with the Parameter Codepage. I have a CSV Data in ANSI Latin(1252) and want to import it. The docker have access to the files. If i delete the codepage parameter the import ist successfully but with letters like äöü looks bad.
Same issue.
For some reason, CODEPAGE is not available on mssql for Linux.
The documentation says:
CODEPAGE is not a supported option on Linux.
We are going to update the documentation to be more precise. CODEPAGE does actually work on SQL Server on Linux but only for RAW and Unicode.
@twright-msft, could you please provide an example of a simple BULK INSERT statement that works on SQL Server on Linux and makes use of CODEPAGE?
Also, this documentation update will be great for clarification.
https://docs.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-2017#examples See example D. Pasted here for convenience. I changed the example to use CODEPAGE = RAW since that is what works on Linux right now.
BULK INSERT MyTable FROM 'D:\data.csv' WITH ( CODEPAGE = 'RAW', DATAFILETYPE = 'char', FIELDTERMINATOR = ',' );
@twright-msft Unfortunately this example does not seem to work for me - still the same error ('CODEPAGE' is not supported on the 'Linux' platform).
Tried in both Docker & and newest SQL-server installed directly on CentOS.
Can you please run this on your SQL Server on Linux and share the output:
SELECT @@VERSION
@twright-msft
Microsoft SQL Server 2017 (RTM-CU10) (KB4342123) - 14.0.3037.1 (X64) Jul 27 2018 09:40:27 Copyright (C) 2017 Microsoft Corporation Developer Edition (64-bit) on Linux (CentOS Linux 7 (Core))
Reproduce:
Install clean CentOS (7.5 / 1804) (followed by yum update && reboot)
$ cat /etc/locale.conf
LANG="en_DK.UTF-8"
$ curl -o /etc/yum.repos.d/mssql-server.repo https://packages.microsoft.com/config/rhel/7/mssql-server-2017.repo
$ curl -o /etc/yum.repos.d/msprod.repo https://packages.microsoft.com/config/rhel/7/prod.repo
$ yum install -y mssql-server mssql-tools unixODBC-devel
$ /opt/mssql/bin/mssql-conf setup
Edition: Developer, Language: English
$ /opt/mssql-tools/bin/sqlcmd -s localhost -U sa -P 'xxx'
1> SELECT @@VERSION;
2> GO
Microsoft SQL Server 2017 (RTM-CU10) (KB4342123) - 14.0.3037.1 (X64)
Jul 27 2018 09:40:27
Copyright (C) 2017 Microsoft Corporation
Developer Edition (64-bit) on Linux (CentOS Linux 7 (Core))
(1 rows affected)
1> BULK INSERT MyTable FROM '/srv/test.tbl' WITH ( CODEPAGE = 'RAW', DATAFILETYPE = 'char', FIELDTERMINATOR = ',' );
2> GO
Msg 16202, Level 15, State 3, Server srvfab-sql1, Line 1
Keyword or statement option 'CODEPAGE' is not supported on the 'Linux' platform.
I wonder if this is something that is being caught in sqlcmd. Can you please try from a different tool like SQL Server Management Studio or SQL Operations Studio? We added support for RAW in cumulative update 4.
@twright-msft Same issue from SQL Operations Studio (Version 0.28.6 if that makes a difference). Can it be something with system locale or some related libraries since it (according to @leykevin) works on Debian?
Turns out this was not actually included in a CU as we thought. I'm trying to get clarity on when it will be included. Is bcp an option for you instead of BULK INSERT?
Thank you very much for clarification - that explains it.
BCP is unfortunately not an option in my case, no. I'll just have to wait till CODEPAGE support comes along before continuing with testing - An ETA would be awesome, if possible, yes. Looking forward to it - Thanks.
@twright-msft Seems to be available in 2019 CTP (tested with 2.1) - at least for RAW, but with UNICODE i still get the same error.
Are there any plans for supporting other other codepages?
Glad you got it working with 2.1. We don’t have any plans for other specific code pages right now. Which ones would you like to have?
We are currently using 1252 (Latin) on Windows environment.
We might be able to do with Unicode but it doesn't seem to work on Linux so I haven't tested - only RAW works and that doesn't work due to some date formats.
Do you plan to support CODEPAGE = 65001? Support under UTF - 8 under Linux environment improves convenience.
I see SQL on Linux 2019 (in preview) adds code page 65001. Disclaimer: I haven't tried SoL19 yet. This feature should resolve many related issues bulk loading data on Linux. The default locate on RHEL/CentOS is c / utf-8 and most data generation tools like dbgen, dsgen, hammerdb, etc., generate characters in the latin9 code page, which are not recognized by bulk insert on SoL 2017.
I thought about moving my data sets to a Windows server, which supports bulk insert command's CODEPAGE option, and use it to load the data into SQL on Linux, however, my data sets are 4TB. It is a pain to move all of that to a Windows server just to try copying it back (bulk insert command) across a network link.
@remi9898 I tried the latest docker image:
Microsoft SQL Server 2019 (CTP3.2) - 15.0.1800.32 (X64) Jul 17 2019 21:29:33 Copyright (C) 2019 Microsoft Corporation Developer Edition (64-bit) on Linux (Ubuntu 16.04.6 LTS) <X64>
and had the same error:
Keyword or statement option 'CODEPAGE' is not supported on the 'Linux' platform.
Where did you see they were adding utf-8 ?
Per Travis' reply earlier, SQL Server 2019 adds support for the CODEPAGE parameter on Linux, but only the value RAW is allowed. In such case, no data conversion is done. This combined with setting DATAFILETYPE=whidechar should do what we want.
Most cases don't require UTF-8 because the extended Latin character set includes the required characters. I found for nearly all of my cases I just needed to create a format file (generated using bcp) and then edit the format file to change every data type to SQLCHAR and set the prefix value to 0. Then, I ran bulk insert using CODEPAGE='RAW'. For example:
A sample format file and bulk insert command:
14.0
8
1 SQLCHAR 0 8 "|" 1 c_custkey ""
2 SQLCHAR 0 50 "|" 2 c_name SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 80 "|" 3 c_address SQL_Latin1_General_CP1_CI_AS
4 SQLCHAR 0 4 "|" 4 c_nationkey ""
5 SQLCHAR 0 30 "|" 5 c_phone SQL_Latin1_General_CP1_CI_AS
6 SQLCHAR 0 8 "|" 6 c_acctbal ""
7 SQLCHAR 0 20 "|" 7 c_mktsegment SQL_Latin1_General_CP1_CI_AS
8 SQLCHAR 0 236 "|\n" 8 c_comment SQL_Latin1_General_CP1_CI_AS
bulk insert customer from 'customer.tbl' with (KEEPNULLS,KEEPIDENTITY,CODEPAGE='RAW',formatfile='customer-n.fmt');
Sorry for late reply. I didn't get any notice from this system about this thread having activity until this morning.
@remi9898 thanks for the reply, unfortunately it's not practical in my case, I gave up on using the linux version for the moment, will probably use bcp if I need to get back to it.
Same issue here. I have a utf-8 file containing french characters (É, à, œ, etc.) and I can't import it in SQL Server using bulk insert. I tried to use "recode" tool to change the file encoding, but I can't find which one could work... utf-16 or unicode doesn't fix the problem.
So is it impossible to import a simple CSV file with special characters to SQL SEVER linux using BULK IMPORT? is there any alternative @twright-msft to bypass this limitation?
@SylvainDevidal did you manage to bypass the issue?
I found this and it works for me: Feedback azure
The only thing I did was to change the file coding from UTF-8 to UTF-16, my query was this:
BULK INSERT dbo.table_name FROM '/Import/source_file.csv' WITH (FIELDTERMINATOR=',', FIELDQUOTE = '"', KEEPIDENTITY, KEEPNULLS);
Hope this works for you.
no luck in GCP can any one help me
I tried like above statement getting Error BULK INSERT dbo.table_name FROM '/Import/source_file.csv' WITH (FIELDTERMINATOR=',', FIELDQUOTE = '"', KEEPIDENTITY, KEEPNULLS); "You do not have permission to use the bulk load statement."
I was trying to get some German and Hungarian characters rendered and CODEPAGE = 'RAW' did it for me
It might be worth considering using a scripted language/tool such as Python or JavaScript(Node) for the import/export. It may not be quite as optimized, but may be more reliable and able to handle edge cases better. Mace sure to use an appropriately configured CSV library that handles line endings inside quoted values.
I've used the mssql/tedious drivers in Node without much issue... don't know what the equivalent is for Python though.
Five years later. Is there a solution for Cyrillic characters? The above methods do not work.
Same issue here. I have a utf-8 file containing french characters (É, à, œ, etc.) and I can't import it in SQL Server using bulk insert. I tried to use "recode" tool to change the file encoding, but I can't find which one could work... utf-16 or unicode doesn't fix the problem.
it work!