qri
qri copied to clipboard
qri: enable the inference of CSV field separator
What feature or capability would you like?
A lot of CSVs in the world are not separed by ,. It would be great to infer the separator and make qri able to read every kind of CSVs.
Do you have a proposed solution?
No but I add the python way to do it
import csv
from datapackage import Resource
resource = Resource({u'path': 'input.csv'})
dialect = csv.Sniffer().sniff(resource.raw_read())
dialect.delimiter
Love it!
The place where this would land is in the detect package: https://github.com/qri-io/dataset/blob/d12a66b92250109b67cd1b74bca763baa0b847e4/detect/detect.go#L39-L47
We should add a FormatConfig function to detect that detects format configuration based on a data format. In the case of CSV files, it should sniff the delimiter.
it could also be used to clean up subsequent calls within the detect package itself, which uses a baseline format configuration for CSV files:
func CSVSchema(resource *dataset.Structure, data io.Reader) (schema map[string]interface{}, n int, err error) {
tr := dsio.NewTrackedReader(data)
r := csv.NewReader(replacecr.Reader(tr))
r.FieldsPerRecord = -1
r.TrimLeadingSpace = true
r.LazyQuotes = true
If detect.FromReader infers & returns Structure.FormatConfig , it'll bubble up into qri here and should "just work" https://github.com/qri-io/qri/blob/aed31e903d07af8e805d5290934e10f41e95ae21/base/dataset_prepare.go#L178-L188