GraphScope
GraphScope copied to clipboard
GraphX Benchmark
- [x] Benchmark Environment
- [x] Graph operator test case
- [x] Graph operator benchmark [Fixed]BUG: outer vertex data sync after map vertices. [Fixed]ISSUE: llvm4jni failed for some methods.
- [ ] Graph analytical benchmark
Operator | spark | GS |
---|---|---|
LoadGraph | 4,245,466ms | 7,675,863ms |
mapVertices(x6) | 2,949ms | 2,801ms |
mapEdges(x6) | 53,276ms | 54,841ms |
mapTriplets(x3) | 237,608ms | 276,451ms |
getDegreeRDD(x2) | 151,321ms | 405 ms |
join with degree(x2) | 50,465ms | 1,830ms |
Benchmark for ArrowProjectedFragment-backend RDD,livejournal,num_worker=1
Operator | spark | ArrowProjected-RDD |
---|---|---|
mapVertices(x3) | 6859ms | 2066ms |
mapTriplets(x3) | 19769ms | 21306ms |
getDegreeRDD(x2) | 15684ms | 64ms |
join with degree(x2) | 472ms | 526ms |
Hey @zhanglei1949 Could you please elaborate a bit more about what is the goal of this benchmark, especially the Graph analytical benchmark? Thanks very much!
@parkerzf Hi, here is the background of this issue: we want to find a way to integrate GraphScope
with Spark GraphX
, which means enabling running GraphScope app on GraphX graph and applying RDD operators like map
on GraphScope Fragment. This work is still WIP, and has not been open sourced yet, but we use issues in GraphScope to track the project progress.
The related code and full benchmark will be released in the near future,If you are interested in GraphScope-GraphX integration, please look forward to the release of this project.
And Apologies If you are confuse by this issue.
Thanks @zhanglei1949 for the intro, looks promising. Got two follow up questions:
- Is GraphFrames on pyspark in the scope? Or this is purely for GraphX?
- From LDBC analytics benchmark, GraphX is not very scalable comparing to other graph processing systems. Is the goal of this issue is to replace the underlying computing engine of GraphX using
ArrowProjected-RDD
instead ofspark
to make it more scalable?
Thanks in advance!
@parkerzf Hi, sorry for the late reply.
- Purely for GraphX, i think.
- Yes,First we will store graph in GraphScope(C++), and wrap fragment as RDD, providing RDD interface for spark through JNI+FastFFI; Then we will make it possible for GraphX algorithms running on GAE(as a special PIE app). We expect much performance gain by doing this.
Very informative and looking forward to it!