spark-jobserver icon indicating copy to clipboard operation
spark-jobserver copied to clipboard

[Java]Cache RDD/Dataframes using NamedRDD/NamedObject

Open nishutayal opened this issue 9 years ago • 3 comments

I am writing spark job in java. runJob() method returns the expected output. Now, I want to cache the output using NamedObjects. It works fine in scala but with Java, it doesn't store anything in Cache. Following is the code I used in Java:

RDD<Tuple2<String, Integer>> rddR = JavaRDD.toRDD(rdd); 
Function0 func = new Function0() {
            public NamedObject apply() {
                return (NamedObject) (new NamedRDD(rddR, true,StorageLevel.MEMORY_AND_DISK()));
            }
};
NamedRDD nRDD = new NamedRDD(rddR, true, StorageLevel.MEMORY_AND_DISK());
obj.namedObjects().update("rdd:wordcounts-result", func,
                obj.namedObjects().defaultTimeout(), rddPersister);

Where obj is the class object and persister is:

static NamedObjectPersister<NamedRDD<RDD>> rddPersister = new RDDPersister<RDD>();

It runs fine without any error, but when I perform get on the namedObjects, it shows blank:

obj.namedObjects().get("rdd:wordcounts-result",obj.namedObjects().defaultTimeout())

What should be the issue and what's the correct code to use for RDD caching in Java?

Any help is really appreciated..!!! Thanks

nishutayal avatar Jun 06 '16 13:06 nishutayal

Hi Nishu,

I am also trying to write java applications, where RDDs can be shared across the same context. I didn't find any suitable documentation or API to use NamedObjects. Were you able to implement this.

Thanks, Vishal

gavishal avatar Sep 28 '16 14:09 gavishal

@hntd187 is this fixed in master by any chance?

noorul avatar Jan 29 '17 01:01 noorul

@nishutayal @gavishal @hntd187 Has the problem been solved?

Ccxlp avatar Sep 03 '18 00:09 Ccxlp