GraphScope
GraphScope copied to clipboard
flex-coordinator proposal
Case 1: GAE for offline graph analytics
Impl GraphScope operator
1. How to submit jobs
Users submit a CRD resource as a batch job by kubectl
, and each job will be run in a new set of pods. For custom algos, users need to package them in an image and run the container as a sidecar.
2. Coordinator responsibility
The coordinator is responsible for 1) identifying the engine pods that will be used in running the MPI job; 2) compiling and distributing the algorithms to the engine machines if needed; 3) managing the execution of MPI job on the engine machines.
3. Monitor
We may use K8s Dashboard to monitor the GraphScope operator status, including:
- monitor and record the job, including the running jobs, failed jobs
- cpu/memory of each pod
- pod log (GAE running log)
But I'm not sure it can store the logs of pods that have completed their execution.
4. CRD
apiVersion: graphscope.io/v1beta1
kind: AnalyticalJob
metadata:
name: x
spec:
algorithm:
custom_algorithm: true
custom_algorithm_image: "myalgo:latest"
name: sssp
runtime_parameters:
- key: source
value: 1
input:
oid_type: "int64_t"
vertex_map: "global"
vertices:
- label: student
loader:
source: "s3://test/student.csv"
delimiter: ","
header_row: true
filetype: "CSV"
args:
edges:
- label: teacher_student
loader:
source: "s3://test/teacher_student.csv"
delimiter: ","
header_row: true
filetype: "CSV"
args:
output:
location: "s3://test/result"
Worker:
replicas: 2
template:
spec:
containers:
- image: graphscope/analytical:latest
name: engine
resources:
limits:
cpu: 2
memory: 4Gi