GraphScope
GraphScope copied to clipboard
[BUG] Segmentation fault with Graphar load
Describe the bug I've got an "SIGSEGV" error when try to load graph in graphar format
To Reproduce
- Ensure you are running GAE in k8s cluster
- Ensure all necessary files were mounted properly. At least, you have got all required graphar files and them were mounted properly. In my further example I mount my host path using the following configuration from "graphscope" helm chart:
volumes:
enabled: true
items:
data:
type: hostPath
field:
type: Directory
path: {{ graphscope_directories_store_host }}
mounts:
- mountPath: {{ graphscope_directories_store_mounted }}
- Make a python script and add the following code:
import graphscope
from graphscope import Graph
from graphscope.framework.loader import Loader
# Resolve host and port, using https://graphscope.io/docs/latest/deployment/deploy_graphscope_with_helm#installation tutorial, or hardcode 'em
host=""
port=""
session = graphscope.session(
addr=f"{host}:{port}",
k8s_namespace="{{ k8s_graphscope_namespace }}"
)
# With my example I used https://github.com/apache/incubator-graphar-testing/tree/955596c325ceba7b607e285738e3dd0ce4ff424e/neo4j files. But the issue could be reproduced with any others from this repository
uri = "graphar+file://{{ graphscope_directories_store_mounted }}/graphar/MovieGraph.graph.yml"
graph = Graph.load_from(uri, session)
session.close()
- Run script
- See error:
*** Aborted at 1731639742 (unix time) try "date -d @1731639742" if you are using GNU date ***
PC: @ 0x0 (unknown)
*** SIGSEGV (@0x8) received by PID 296 (TID 0x7c1707e00640) from PID 8; stack trace: ***
@ 0x7c17157db046 (unknown)
@ 0x7c1712f13520 (unknown)
*** Aborted at 1731639742 (unix time) try "date -d @1731639742" if you are using GNU date ***
@ 0x7c17172e425f _ZZN8vineyard17GARFragmentLoaderIlmNS_14ArrowVertexMapEE18constructVertexMapEvENKUliE_clEi
PC: @ 0x0 (unknown)
*** SIGSEGV (@0x8) received by PID 324 (TID 0x77a733400640) from PID 8; stack trace: ***
@ 0x7c17173182a7 vineyard::GARFragmentLoader<>::constructVertexMap()
@ 0x77a7416bc046 (unknown)
@ 0x7c171731b797 vineyard::GARFragmentLoader<>::LoadFragment()
@ 0x77a73edf4520 (unknown)
@ 0x7c171731c140 vineyard::GARFragmentLoader<>::LoadFragmentAsFragmentGroup()
@ 0x7c17074e1451 LoadGraph
@ 0x58967b75c0d1 gs::GrapeInstance::loadGraph()
@ 0x58967b761d65 gs::GrapeInstance::OnReceive()
@ 0x58967b7f018a gs::Dispatcher::processCmd()
@ 0x58967b7f1ebe gs::Dispatcher::subscriberLoop()
@ 0x77a7431c525f _ZZN8vineyard17GARFragmentLoaderIlmNS_14ArrowVertexMapEE18constructVertexMapEvENKUliE_clEi
@ 0x7c17131f8253 (unknown)
@ 0x77a7431f92a7 vineyard::GARFragmentLoader<>::constructVertexMap()
@ 0x7c1712f65ac3 (unknown)
@ 0x7c1712ff7850 (unknown)
@ 0x77a7431fc797 vineyard::GARFragmentLoader<>::LoadFragment()
@ 0x77a7431fd140 vineyard::GARFragmentLoader<>::LoadFragmentAsFragmentGroup()
@ 0x77a7334e1451 LoadGraph
@ 0x5cb0ae4510d1 gs::GrapeInstance::loadGraph()
@ 0x5cb0ae456d65 gs::GrapeInstance::OnReceive()
@ 0x5cb0ae4e518a gs::Dispatcher::processCmd()
@ 0x5cb0ae4e7906 gs::Dispatcher::publisherLoop()
@ 0x77a73f0d9253 (unknown)
@ 0x77a73ee46ac3 (unknown)
@ 0x77a73eed8850 (unknown)
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 296 on node gs-engine-gs-gae-1 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
RPC failed: rpc _run_step_impl failed: status <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "Socket closed. The traceback is: Traceback (most recent call last):
File "/home/graphscope/.local/lib/python3.10/site-packages/gscoordinator/servicer/graphscope_one/service.py", line 245, in _RunStep
head, bodies = self._operation_executor.run_on_analytical_engine(
File "/home/graphscope/.local/lib/python3.10/site-packages/gscoordinator/monitor.py", line 191, in runOnAnalyticalEngineWarp
res = func(instance, dag_def, dag_bodies, loader_op_bodies)
File "/home/graphscope/.local/lib/python3.10/site-packages/gscoordinator/op_executor.py", line 178, in run_on_analytical_engine
response_head, response_bodies = self.run_step(dag_def, dag_bodies)
File "/home/graphscope/.local/lib/python3.10/site-packages/gscoordinator/op_executor.py", line 106, in run_step
for response in responses:
File "/home/graphscope/.local/lib/python3.10/site-packages/grpc/_channel.py", line 543, in __next__
return self._next()
File "/home/graphscope/.local/lib/python3.10/site-packages/grpc/_channel.py", line 969, in _next
raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "Socket closed"
debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"Socket closed", grpc_status:14, created_time:"2024-11-15T03:02:22.646397916+00:00"}"
>
"
debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"Socket closed. The traceback is: Traceback (most recent call last):\n File \"/home/graphscope/.local/lib/python3.10/site-packages/gscoordinator/servicer/graphscope_one/service.py\", line 245, in _RunStep\n head, bodies = self._operation_executor.run_on_analytical_engine(\n File \"/home/graphscope/.local/lib/python3.10/site-packages/gscoordinator/monitor.py\", line 191, in runOnAnalyticalEngineWarp\n res = func(instance, dag_def, dag_bodies, loader_op_bodies)\n File \"/home/graphscope/.local/lib/python3.10/site-packages/gscoordinator/op_executor.py\", line 178, in run_on_analytical_engine\n response_head, response_bodies = self.run_step(dag_def, dag_bodies)\n File \"/home/graphscope/.local/lib/python3.10/site-packages/gscoordinator/op_executor.py\", line 106, in run_step\n for response in responses:\n File \"/home/graphscope/.local/lib/python3.10/site-packages/grpc/_channel.py\", line 543, in __next__\n return self._next()\n File \"/home/graphscope/.local/lib/python3.10/site-packages/grpc/_channel.py\", line 969, in _next\n raise self\ngrpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:\n\tstatus = StatusCode.UNAVAILABLE\n\tdetails = \"Socket closed\"\n\tdebug_error_string = \"UNKNOWN:Error received from peer {grpc_message:\"Socket closed\", grpc_status:14, created_time:\"2024-11-15T03:02:22.646397916+00:00\"}\"\n>\n", grpc_status:14, created_time:"2024-11-15T03:02:22.649210409+00:00"}"
Expected behavior
No errors occurred.
Screenshots
All running pods
Environment:
- GraphScope version: v0.29.0
- OS: Ubuntu
- Version 24.04
- Kubernetes Version 1.28.14
- Python version: 3.11.10 (with following dependencies: graphscope==0.29.0, graphscope-client==0.29.0, pandas==2.0.3, aiohttp, async_timeout)
Additional context The same issue appeared if I run python script on empty cluster (when graphscope library creates all required pods, svc, etc. by itself) like this:
session = graphscope.session(
k8s_image_registry="{{ docker_image_repository_hosted }}",
k8s_image_repository="rnd-grapher/graphscope",
k8s_vineyard_image="{{ docker_image_repository }}/vineyardcloudnative/vineyardd:latest",
k8s_namespace="{{ k8s_graphscope_namespace }}",
k8s_volumes={
"data": {
"type": "hostPath",
"field": {
"path": os.path.expanduser("~/examples/"),
"type": "Directory"
},
"mounts": {
"mountPath": "/examples/"
}
}
}
)
/cc @yecol @sighingnow, this issus/pr has had no activity for a long time, please help to review the status and assign people to work on it.