vega icon indicating copy to clipboard operation
vega copied to clipboard

Tracking issue: Implementation of lacking core RDD ops

Open iduartgomez opened this issue 5 years ago • 6 comments
trafficstars

For core RDD ops we understand those which spawn in the original Apache Spark from SparkContext and/or the base RDD class and friends: SC:

  • [x] range
  • [x] filter
  • [x] randomSplit
  • [ ] sortBy
  • [x] groupBy
  • [x] keyBy
  • [ ] zipPartitions
  • [x] intersection
  • [ ] pipe
  • [x] zip
  • [ ] substract
  • [ ] treeAggregate
  • [ ] treeReduce
  • [x] countApprox
  • [x] countByValue
  • [x] countByValueApprox
  • [x] min and max
  • [x] top
  • [x] takeOrdered
  • [x] isEmpty

Non-goals for this tracking issue are any I/O related ops as we are tracking those elsewhere and doing things a little bit differently:

  • textFile
  • wholeTextFiles
  • binary files | binary records
  • Hadoop* family of methods

iduartgomez avatar Jan 28 '20 19:01 iduartgomez

Intersection completed in #66

iduartgomez avatar Feb 18 '20 17:02 iduartgomez

range done in #82

iduartgomez avatar Apr 11 '20 19:04 iduartgomez

@iduartgomez - Isn't substract a misspelling of subtract ?

GavrielPlotke avatar May 19 '20 05:05 GavrielPlotke

fixed @GavrielPlotke

rajasekarv avatar May 19 '20 05:05 rajasekarv

what would the subtract operation entail, can someone give an example?

ajprabhu09 avatar Jul 02 '20 12:07 ajprabhu09

Doc: https://spark.apache.org/docs/1.0.2/api/java/org/apache/spark/rdd/RDD.html#subtract(org.apache.spark.rdd.RDD) Example:

  • I have a list of customers that I want to advertise to
  • I have a list of angry customers who have said "DON'T TALK TO ME!" email_list_rdd = customers_rdd.subtract(angry_rdd)

GavrielPlotke avatar Jul 02 '20 14:07 GavrielPlotke