spark-csv
spark-csv copied to clipboard
CSV Data Source for Apache Spark 1.x
Adds Relation, LineReader and BulkReader traits to avoid duplicated code. Largely derived from https://github.com/quartethealth/spark-csv and https://github.com/quartethealth/spark-fixedwidth. This is in response to the following PR (created by @blrnw3) being closed without...
Added the comments for csv file paths
This is the change that allows an option to render errors when parsing such as number format exceptions as nulls. It was in this pull request, https://github.com/databricks/spark-csv/pull/298 but I thought...
several parsing options are added. they are organized in classes because there are many of them. a "text" based API to configure options is provided. another feature is the ability...
I don't know Scala (at all!) so there's almost certainly cleaner ways - my apologies. The logging at the moment is sometimes unhelpful as it's hard to see the real...
For the context and discussion on this, please refer to https://github.com/databricks/spark-csv/pull/244.
There's datasets where each column has it's own marker for missing values. spark-csv assumes only empty string for missing values. To avoid additional data transformation and saving on user's side...
If you are not using userSchema by default all fields in csv file are assumed to be StringType. This commit adds possibility to setup types for fields which are not...