GraphScope GraphX Benchmark

[x] Benchmark Environment
[x] Graph operator test case
[x] Graph operator benchmark [Fixed]BUG: outer vertex data sync after map vertices. [Fixed]ISSUE: llvm4jni failed for some methods.
[ ] Graph analytical benchmark

Operator	spark	GS
LoadGraph	4,245,466ms	7,675,863ms
mapVertices(x6)	2,949ms	2,801ms
mapEdges(x6)	53,276ms	54,841ms
mapTriplets(x3)	237,608ms	276,451ms
getDegreeRDD(x2)	151,321ms	405 ms
join with degree(x2)	50,465ms	1,830ms

Benchmark for ArrowProjectedFragment-backend RDD,livejournal,num_worker=1

Operator	spark	ArrowProjected-RDD
mapVertices(x3)	6859ms	2066ms
mapTriplets(x3)	19769ms	21306ms
getDegreeRDD(x2)	15684ms	64ms
join with degree(x2)	472ms	526ms

May 24 '22 02:05 zhanglei1949

Hey @zhanglei1949 Could you please elaborate a bit more about what is the goal of this benchmark, especially the Graph analytical benchmark? Thanks very much!

Jun 13 '22 07:06 parkerzf

@parkerzf Hi, here is the background of this issue: we want to find a way to integrate GraphScope with Spark GraphX, which means enabling running GraphScope app on GraphX graph and applying RDD operators like map on GraphScope Fragment. This work is still WIP, and has not been open sourced yet, but we use issues in GraphScope to track the project progress.

The related code and full benchmark will be released in the near future,If you are interested in GraphScope-GraphX integration, please look forward to the release of this project.

And Apologies If you are confuse by this issue.

Jun 13 '22 09:06 zhanglei1949

Thanks @zhanglei1949 for the intro, looks promising. Got two follow up questions:

Is GraphFrames on pyspark in the scope? Or this is purely for GraphX?
From LDBC analytics benchmark, GraphX is not very scalable comparing to other graph processing systems. Is the goal of this issue is to replace the underlying computing engine of GraphX using ArrowProjected-RDD instead of spark to make it more scalable?

Thanks in advance!

Jun 15 '22 08:06 parkerzf

@parkerzf Hi, sorry for the late reply.

Purely for GraphX, i think.
Yes,First we will store graph in GraphScope(C++), and wrap fragment as RDD, providing RDD interface for spark through JNI+FastFFI; Then we will make it possible for GraphX algorithms running on GAE(as a special PIE app). We expect much performance gain by doing this.

Jun 24 '22 02:06 zhanglei1949

Very informative and looking forward to it!

Jun 24 '22 07:06 parkerzf

GraphScope GraphScope copied to clipboard

GraphX Benchmark

GraphScope
GraphScope copied to clipboard