spyql
spyql copied to clipboard
general text file support
I'm using spyql to handle general text files (single-column, not comma-separated), but got a problem.
A single-column text file like the following is misinterpreted as multiple columns by spyql.
test.csv:
user_id
a11e11
b22e22
$ spyql "SELECT * FROM csv('test.csv') TO pretty"
us r_id
---- ------
a11 11
b22 22
If there are no common characters among rows, the following error is returned.
ERROR Could not detect CSV dialect from input
_csv.Error: Could not determine delimiter
I found that explicitly setting a delimiter like csv('test.csv', delimiter=',')
works as a workaround in the current version, but I would like to suggest the following options.
[Option 1] Modify default behavior of csv()
- As a delimiter character, search only "," (especially not alphanumeric characters)
- If no delimiter is detected (and each row is not too long), read it as a single-column text file rather than returning an error
[Option 2] Add an alternative to csv()
which can read a single-column text file (something like txt()
)
Hey @Minyus, thank you for your feedback!
Have you tried the following?
SELECT col1 FROM text('test.csv') TO pretty
The text
processor simply reads each line into the column col1
.
It does not detect the header though.
If you want to skip the first line you can do:
spyql "SELECT col1 AS my_text FROM text('test.csv') OFFSET 1 TO pretty"
Please let me know if this solves your issue. Thank you!
BTW, alternatively, you can remove the first line (header) of the file using sed
and then piping the output to spyql:
$ sed 1d test.csv | spyql "SELECT col1 AS my_text FROM text TO pretty"
Thank you, @dcmoura !
I didn't know text
processor.
Since text
processor needs OFFSET option or preprocessing (by sed) to handle single-column text files with header, it may not seem straightforward for users like me, but thank you anyway.
Thank you, @dcmoura !
I didn't know
text
processor.Since
text
processor needs OFFSET option or preprocessing (by sed) to handle single-column text files with header, it may not seem straightforward for users like me, but thank you anyway.
We might include an option to treat the first line as the header, would that help? In that case you would not require to use the offset or the sed preprocessing. This should be fairly simple to implement.
Yes, but that still needs to specify the header option, which is not consistent with CSV
processor.
I would suggest text
processor treats the first line as the header in default to align with CSV
processor.
the csv processor has an header option. It auto-detects if there is an header or not, but you can also set the option manually.