usql icon indicating copy to clipboard operation
usql copied to clipboard

rowDelimiter too long for "\r\n"

Open kevinguzo opened this issue 7 years ago • 12 comments

USQL job failed with "'row delimiter' should be no longer than 1."

When specify extractor property as following: USING Extractors.Text(skipFirstNRows:1,quoting:true,delimiter:',',rowDelimiter:"\r\n");

rowDelimiter should be able to be specified as "\r\n" as documented here: https://msdn.microsoft.com/en-us/library/azure/mt492749.aspx#rowDelimiter

kevinguzo avatar Oct 24 '17 22:10 kevinguzo

Having the same problem! Any comment on this?

jonsing avatar Feb 08 '18 11:02 jonsing

Some conflicting info in the documentation. Shows more than 1 character for example delimiters, says max 1 character. https://msdn.microsoft.com/en-us/azure/data-lake-analytics/u-sql/extractor-parameters-u-sql

Did you try Extractors.Csv()?

asears avatar Feb 09 '18 01:02 asears

The CSV extractor behaves the same.

jonsing avatar Feb 17 '18 06:02 jonsing

The documentation is a bit confusing. The default value of rowDelimiter parameter of the default text extractors (Text, Csv, Tsv) is null, which is handled in a very special way. Any of the following is recognized as row delimiter:

  • Lonely carriage-return (0x13),
  • Lonely line-feed (0x10), or
  • carriage-return followed by line-feed (0x13 0x10).

An empty string is equivalent to null. A non-empty string must contain exactly one character.

MKadaner avatar Feb 20 '18 01:02 MKadaner

@MKadaner I'm not sure I understand. Is it possible to set the Row Delimiter using the CSV extractor to only "\r\n"? So that ony "\r'n" will be seen by the extractor as valid row delimiters

Lubits avatar May 03 '18 16:05 Lubits

Hi @Lubits. This is currently not possible in the built-in Extractors. You would have to write a custom extractor, and if your data is larger than 1GB, you would have to handle the cases where the split point happens to be exactly between the \r and \n in addition to handling if the split point occurs inside a row in general.

MikeRys avatar May 04 '18 00:05 MikeRys

With the custom extractor, you can call input.Split("\r\n"), and it will split the input on <CR><LF> boundary only. Neither lone <CR> nor lone <LF> will be considered row delimiters.

MKadaner avatar May 05 '18 00:05 MKadaner

The more I attempt to use the built in extractors the more inhibiting I find them, Is there an example of a custom extractor somewhere that uses input.Split("\r\n") and then behaves the same as the CSV extractor in regards to specifying fields and field types?

runxc1 avatar Jul 06 '19 04:07 runxc1

I'm having the same problem here. If you have a custom extractor that fixes this, please share.

jbsilva avatar Aug 12 '19 20:08 jbsilva

I ended up just leaving the text columns out that might include carriage return. It was a pain to deal with and I didn't want to sink another day or two into it.

runxc1 avatar Aug 13 '19 17:08 runxc1

Any suggestions on how to do this today?

alexgman avatar Sep 18 '19 20:09 alexgman