Evan Sparks

Results 45 comments of Evan Sparks

Removing the 0.3.0 milestone because we optimized standard convolution and we don't yet have use cases that would warrant an FFT based convolution.

Should also contain a link to the spark style guide (which is our coding standard) and something brief about the expectation of how docs are formatted and desired test coverage.

Hi there, There are a couple of weaknesses with our current use of CoreNLP 1) Performance: While CoreNLP is pretty quick, it does take some time to initialize and given...

the ScalaNLP stuff is great - and comes out of David Hall/Dan Klein's work, so I expect it to be quite modern. Kind of a bummer that it doesn't include...

To be fair - I have no problem with `new` - it's just a dislike for boilerplate. Guidelines about where and when to use objects vs. case classes could also...

This looks awesome, thanks so much for the contribution Manish! The big question I have is whether you looked at using MLTable and its API for your input? Were there...

This is awesome, thanks Manish - we'll plan to test your code for scalability on a cluster this week. On Sat, Oct 19, 2013 at 6:43 PM, manishamde [email protected]: >...

This is terrific work! Basic functionality is there and scaling well for large datasets based on my tests. Though, I don't see special logic differentiating between continuous and categorical features....

Thinking through this change today, I'm not so sure it's necessary at the moment. `SparkSession` is part of the SparkSQL namespace and primarily designed to support `Dataset` access. We need...

One thing that makes the block solves tricky is that the blocks are not independent. That is - we pass a `Seq[RDD[T]]` because the solution to the second block depends...