usql
usql copied to clipboard
rowDelimiter too long for "\r\n"
USQL job failed with "'row delimiter' should be no longer than 1."
When specify extractor property as following: USING Extractors.Text(skipFirstNRows:1,quoting:true,delimiter:',',rowDelimiter:"\r\n");
rowDelimiter should be able to be specified as "\r\n" as documented here: https://msdn.microsoft.com/en-us/library/azure/mt492749.aspx#rowDelimiter
Having the same problem! Any comment on this?
Some conflicting info in the documentation. Shows more than 1 character for example delimiters, says max 1 character. https://msdn.microsoft.com/en-us/azure/data-lake-analytics/u-sql/extractor-parameters-u-sql
Did you try Extractors.Csv()?
The CSV extractor behaves the same.
The documentation is a bit confusing. The default value of rowDelimiter
parameter of the default text extractors (Text
, Csv
, Tsv
) is null
, which is handled in a very special way. Any of the following is recognized as row delimiter:
- Lonely
carriage-return
(0x13
), - Lonely
line-feed
(0x10
), or -
carriage-return
followed byline-feed
(0x13 0x10
).
An empty string is equivalent to null
. A non-empty string must contain exactly one character.
@MKadaner I'm not sure I understand. Is it possible to set the Row Delimiter using the CSV extractor to only "\r\n"? So that ony "\r'n" will be seen by the extractor as valid row delimiters
Hi @Lubits. This is currently not possible in the built-in Extractors. You would have to write a custom extractor, and if your data is larger than 1GB, you would have to handle the cases where the split point happens to be exactly between the \r and \n in addition to handling if the split point occurs inside a row in general.
With the custom extractor, you can call input.Split("\r\n")
, and it will split the input on <CR><LF>
boundary only. Neither lone <CR>
nor lone <LF>
will be considered row delimiters.
The more I attempt to use the built in extractors the more inhibiting I find them, Is there an example of a custom extractor somewhere that uses input.Split("\r\n") and then behaves the same as the CSV extractor in regards to specifying fields and field types?
I'm having the same problem here. If you have a custom extractor that fixes this, please share.
I ended up just leaving the text columns out that might include carriage return. It was a pain to deal with and I didn't want to sink another day or two into it.
Any suggestions on how to do this today?
See comments https://github.com/Azure/usql/issues/106#issuecomment-386473304 and https://github.com/Azure/usql/issues/106#issuecomment-386764311
Best regards Michael
From: Alex Gordon [email protected] Sent: Wednesday, September 18, 2019 1:25 PM To: Azure/usql [email protected] Cc: Michael Rys [email protected]; Comment [email protected] Subject: Re: [Azure/usql] rowDelimiter too long for "\r\n" (#106)
Any suggestions on how to do this today?
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2Fusql%2Fissues%2F106%3Femail_source%3Dnotifications%26email_token%3DACZXGJG34ARV55HYCRX6NPDQKKFA3A5CNFSM4EAUWCK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7BKU6Q%23issuecomment-532851322&data=02%7C01%7Cmrys%40microsoft.com%7C1ac0dbeb04014b9ee3e808d73c763eeb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637044350867167495&sdata=w2F%2BYzE2fKCW1RF5M2YwmOsrH2jl4pTyXMR%2BcxST84A%3D&reserved=0, or mute the threadhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FACZXGJCCER5RUAL6XUBUPTDQKKFA3ANCNFSM4EAUWCKQ&data=02%7C01%7Cmrys%40microsoft.com%7C1ac0dbeb04014b9ee3e808d73c763eeb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637044350867177491&sdata=4pmJ%2Byn9n4GLzyZl1uhyiw2aX5I%2BWJHtBboB8cm7cZk%3D&reserved=0.