GraphScope icon indicating copy to clipboard operation
GraphScope copied to clipboard

GraphX Benchmark

Open zhanglei1949 opened this issue 2 years ago • 5 comments

  • [x] Benchmark Environment
  • [x] Graph operator test case
  • [x] Graph operator benchmark [Fixed]BUG: outer vertex data sync after map vertices. [Fixed]ISSUE: llvm4jni failed for some methods.
  • [ ] Graph analytical benchmark
Operator spark GS
LoadGraph 4,245,466ms 7,675,863ms
mapVertices(x6) 2,949ms 2,801ms
mapEdges(x6) 53,276ms 54,841ms
mapTriplets(x3) 237,608ms 276,451ms
getDegreeRDD(x2) 151,321ms 405 ms
join with degree(x2) 50,465ms 1,830ms

Benchmark for ArrowProjectedFragment-backend RDD,livejournal,num_worker=1

Operator spark ArrowProjected-RDD
mapVertices(x3) 6859ms 2066ms
mapTriplets(x3) 19769ms 21306ms
getDegreeRDD(x2) 15684ms 64ms
join with degree(x2) 472ms 526ms

zhanglei1949 avatar May 24 '22 02:05 zhanglei1949

Hey @zhanglei1949 Could you please elaborate a bit more about what is the goal of this benchmark, especially the Graph analytical benchmark? Thanks very much!

parkerzf avatar Jun 13 '22 07:06 parkerzf

@parkerzf Hi, here is the background of this issue: we want to find a way to integrate GraphScope with Spark GraphX, which means enabling running GraphScope app on GraphX graph and applying RDD operators like map on GraphScope Fragment. This work is still WIP, and has not been open sourced yet, but we use issues in GraphScope to track the project progress.

The related code and full benchmark will be released in the near future,If you are interested in GraphScope-GraphX integration, please look forward to the release of this project.

And Apologies If you are confuse by this issue.

zhanglei1949 avatar Jun 13 '22 09:06 zhanglei1949

Thanks @zhanglei1949 for the intro, looks promising. Got two follow up questions:

  1. Is GraphFrames on pyspark in the scope? Or this is purely for GraphX?
  2. From LDBC analytics benchmark, GraphX is not very scalable comparing to other graph processing systems. Is the goal of this issue is to replace the underlying computing engine of GraphX using ArrowProjected-RDD instead of spark to make it more scalable?

Thanks in advance!

parkerzf avatar Jun 15 '22 08:06 parkerzf

@parkerzf Hi, sorry for the late reply.

  1. Purely for GraphX, i think.
  2. Yes,First we will store graph in GraphScope(C++), and wrap fragment as RDD, providing RDD interface for spark through JNI+FastFFI; Then we will make it possible for GraphX algorithms running on GAE(as a special PIE app). We expect much performance gain by doing this.

zhanglei1949 avatar Jun 24 '22 02:06 zhanglei1949

Very informative and looking forward to it!

parkerzf avatar Jun 24 '22 07:06 parkerzf