spark-csv
spark-csv copied to clipboard
parsing options and serializing arrays
several parsing options are added. they are organized in classes because there are many of them. a "text" based API to configure options is provided.
another feature is the ability to serialize a column of arrays. the array is "unnested" and column names to use are supplied by user. this is useful for writing out csv after doing transforms on the data that "expand" the number of columns e.g. one hot encode a category. this can be improved later. e.g. sparse vector from mllib can replace the array.
Please update it to rebase against master for tests to run. Also please revert changes to build.sbt
.
Current coverage is 84.79%
Merging #113 into master will decrease coverage by -0.22% as of
516c8a0
@@ master #113 diff @@
======================================
Files 10 11 +1
Stmts 407 526 +119
Branches 125 150 +25
Methods 0 0
======================================
+ Hit 346 446 +100
Partial 0 0
- Missed 61 80 +19
Review entire Coverage Diff
Powered by Codecov
@falaki should be ready to merge now.
@mohitjaggi This is fairly large. I am about to publish a release with schema inference and all the recent improvements, and then I will review this.
@mohitjaggi this is packing too much into a single PR. Would you please split it. Please first submit one for number parsing improvements and another for arrays. On arrays within CSV it would be good to post an issue and gather some feedback from community first.
will do
On Thu, Aug 13, 2015 at 10:14 AM, Hossein Falaki [email protected] wrote:
@mohitjaggi https://github.com/mohitjaggi this is packing too much into a single PR. Would you please split it. Please first submit one for number parsing improvements and another for arrays. On arrays within CSV it would be good to post an issue and gather some feedback from community first.
— Reply to this email directly or view it on GitHub https://github.com/databricks/spark-csv/pull/113#issuecomment-130765125.
see #124
@mohitjaggi Wouldn't this make sense to close this if #124 is subset from this and you are willing to make some more?