datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

User guide is incorrect regarding using CLI to register CSV files using schema inference

Open andygrove opened this issue 3 years ago • 1 comments

Describe the bug

The user guide page at https://arrow.apache.org/datafusion/cli/index.html states that "It is necessary to provide schema information for CSV files since DataFusion does not automatically infer the schema when using SQL to query CSV files." but this is not true, as demonstrated below:

DataFusion CLI v10.0.0
❯ create external table a stored as csv with header row location '/tmp/a.csv';
0 rows in set. Query took 0.017 seconds.
❯ select * from a;
+---+---+---+---+
| a | b | c | d |
+---+---+---+---+
| 1 | 2 | 3 | 4 |
+---+---+---+---+
1 row in set. Query took 0.011 seconds.
❯ describe a;
+-------------+-----------+-------------+
| column_name | data_type | is_nullable |
+-------------+-----------+-------------+
| a           | Int64     | NO          |
| b           | Int64     | NO          |
| c           | Int64     | NO          |
| d           | Int64     | NO          |
+-------------+-----------+-------------+
4 rows in set. Query took 0.017 seconds.

To Reproduce See above.

Expected behavior We should update the user guide to state that specifying a schema is optional.

Additional context None

andygrove avatar Aug 01 '22 14:08 andygrove

I have a PR coming that will fix this and a bunch of other issues with the docs

kmitchener avatar Aug 01 '22 16:08 kmitchener

Hi. I think this issue is fixed (I guess on #3171) but not understand why it is still open? https://arrow.apache.org/datafusion/user-guide/sql/ddl.html

image

retikulum avatar Oct 26 '22 21:10 retikulum

Did you have a chance to take a look? @andygrove

retikulum avatar Oct 28 '22 23:10 retikulum

Thanks @retikulum for noticing

Dandandan avatar Oct 29 '22 04:10 Dandandan