spyql icon indicating copy to clipboard operation
spyql copied to clipboard

general text file support

Open Minyus opened this issue 2 years ago • 6 comments

I'm using spyql to handle general text files (single-column, not comma-separated), but got a problem.

A single-column text file like the following is misinterpreted as multiple columns by spyql.

test.csv:

user_id
a11e11
b22e22
$ spyql "SELECT * FROM csv('test.csv') TO pretty"
us      r_id
----  ------
a11       11
b22       22

If there are no common characters among rows, the following error is returned.

ERROR   Could not detect CSV dialect from input
_csv.Error: Could not determine delimiter

I found that explicitly setting a delimiter like csv('test.csv', delimiter=',') works as a workaround in the current version, but I would like to suggest the following options.

[Option 1] Modify default behavior of csv()

  1. As a delimiter character, search only "," (especially not alphanumeric characters)
  2. If no delimiter is detected (and each row is not too long), read it as a single-column text file rather than returning an error

[Option 2] Add an alternative to csv() which can read a single-column text file (something like txt())

Minyus avatar Dec 09 '22 13:12 Minyus

Hey @Minyus, thank you for your feedback!

Have you tried the following?

SELECT col1 FROM text('test.csv') TO pretty

The text processor simply reads each line into the column col1. It does not detect the header though. If you want to skip the first line you can do:

spyql "SELECT col1 AS my_text FROM text('test.csv') OFFSET 1 TO pretty"

Please let me know if this solves your issue. Thank you!

dcmoura avatar Dec 09 '22 13:12 dcmoura

BTW, alternatively, you can remove the first line (header) of the file using sed and then piping the output to spyql:

$ sed 1d test.csv | spyql "SELECT col1 AS my_text FROM text TO pretty"

dcmoura avatar Dec 09 '22 13:12 dcmoura

Thank you, @dcmoura ! I didn't know text processor. Since text processor needs OFFSET option or preprocessing (by sed) to handle single-column text files with header, it may not seem straightforward for users like me, but thank you anyway.

Minyus avatar Dec 10 '22 10:12 Minyus

Thank you, @dcmoura !

I didn't know text processor.

Since text processor needs OFFSET option or preprocessing (by sed) to handle single-column text files with header, it may not seem straightforward for users like me, but thank you anyway.

We might include an option to treat the first line as the header, would that help? In that case you would not require to use the offset or the sed preprocessing. This should be fairly simple to implement.

dcmoura avatar Dec 10 '22 12:12 dcmoura

Yes, but that still needs to specify the header option, which is not consistent with CSV processor. I would suggest text processor treats the first line as the header in default to align with CSV processor.

Minyus avatar Dec 11 '22 03:12 Minyus

the csv processor has an header option. It auto-detects if there is an header or not, but you can also set the option manually.

dcmoura avatar Dec 11 '22 09:12 dcmoura