GraphScope icon indicating copy to clipboard operation
GraphScope copied to clipboard

[BUG] Error while loading graph from vineyard

Open ajay-48213 opened this issue 7 months ago • 2 comments

Describe the bug I have analytics engine deployed using this chart (https://github.com/alibaba/GraphScope/tree/main/charts/graphscope). This is deployed on a k8s cluster and has all the necessary components (coordinator, engine and vineyard). I am able to establish session using the coordinator ip address and then load data into vineyard from a pvc. However, once I close the session and reconnect, and then if I try to reload graph from vineyard using vineyard object id, it doesn't work and fails with the following error. Note that, after getting this error, the session just freezes and newer sessions also fail to establish and a cleanup & redeploy is needed. The error seems similar to this issue - https://github.com/alibaba/GraphScope/issues/4314, not sure if it also has the same rootcause.

2025-04-14 22:23:42,742 [INFO][session:597]: Connecting graphscope session with address: xxx.xxx.xxx.xxx:xxxx
2025-04-14 22:23:42,747 [INFO][rpc:69]: GraphScope coordinator service connected.
I0414 22:23:53.000000   318 /home/graphscope/GraphScope/analytical_engine/core/grape_instance.cc:1268] Registering Graph, graph type: ARROW_PROPERTY, Type signature: 41df71fc38edd891f4de0d9004c663e762002931afbe51e1e2fef7ef610ff9c2, lib path: /tmp/gs/builtin/41df71fc38edd891f4de0d9004c663e762002931afbe51e1e2fef7ef610ff9c2/lib41df71fc38edd891f4de0d9004c663e762002931afbe51e1e2fef7ef610ff9c2.so
I0414 22:23:53.000000   256 /home/graphscope/GraphScope/analytical_engine/core/grape_instance.cc:1268] Registering Graph, graph type: ARROW_PROPERTY, Type signature: 41df71fc38edd891f4de0d9004c663e762002931afbe51e1e2fef7ef610ff9c2, lib path: /tmp/gs/builtin/41df71fc38edd891f4de0d9004c663e762002931afbe51e1e2fef7ef610ff9c2/lib41df71fc38edd891f4de0d9004c663e762002931afbe51e1e2fef7ef610ff9c2.so
I0414 22:23:53.000000   293 /home/graphscope/GraphScope/analytical_engine/core/grape_instance.cc:1268] Registering Graph, graph type: ARROW_PROPERTY, Type signature: 41df71fc38edd891f4de0d9004c663e762002931afbe51e1e2fef7ef610ff9c2, lib path: /tmp/gs/builtin/41df71fc38edd891f4de0d9004c663e762002931afbe51e1e2fef7ef610ff9c2/lib41df71fc38edd891f4de0d9004c663e762002931afbe51e1e2fef7ef610ff9c2.so
I0414 22:23:53.000000   226 /home/graphscope/GraphScope/analytical_engine/core/grape_instance.cc:1268] Registering Graph, graph type: ARROW_PROPERTY, Type signature: 41df71fc38edd891f4de0d9004c663e762002931afbe51e1e2fef7ef610ff9c2, lib path: /tmp/gs/builtin/41df71fc38edd891f4de0d9004c663e762002931afbe51e1e2fef7ef610ff9c2/lib41df71fc38edd891f4de0d9004c663e762002931afbe51e1e2fef7ef610ff9c2.so
I0414 22:23:53.000000   318 /home/graphscope/GraphScope/analytical_engine/core/grape_instance.cc:145] Loading graph, graph name: graph_G9a1e0uR, graph type: ArrowFragment, type sig: 41df71fc38edd891f4de0d9004c663e762002931afbe51e1e2fef7ef610ff9c2
I0414 22:23:53.000000   256 /home/graphscope/GraphScope/analytical_engine/core/grape_instance.cc:145] Loading graph, graph name: graph_G9a1e0uR, graph type: ArrowFragment, type sig: 41df71fc38edd891f4de0d9004c663e762002931afbe51e1e2fef7ef610ff9c2
I0414 22:23:53.000000   293 /home/graphscope/GraphScope/analytical_engine/core/grape_instance.cc:145] Loading graph, graph name: graph_G9a1e0uR, graph type: ArrowFragment, type sig: 41df71fc38edd891f4de0d9004c663e762002931afbe51e1e2fef7ef610ff9c2
I0414 22:23:53.000000   226 /home/graphscope/GraphScope/analytical_engine/core/grape_instance.cc:145] Loading graph, graph name: graph_G9a1e0uR, graph type: ArrowFragment, type sig: 41df71fc38edd891f4de0d9004c663e762002931afbe51e1e2fef7ef610ff9c2
[error] Check failed: Object not exists: failed to get metadata for 'o658d999fe5c00059': failed to read get_data reply: {"content":null,"type":"get_data_reply"} in "this->GetMetaData(id, meta, sync_remote)"
*** Aborted at 1744669433 (unix time) try "date -d @1744669433" if you are using GNU date ***
[error] Check failed: Object not exists: failed to get metadata for 'o658d999fe5c00059': failed to read get_data reply: {"content":null,"type":"get_data_reply"} in "this->GetMetaData(id, meta, sync_remote)"
*** Aborted at 1744669433 (unix time) try "date -d @1744669433" if you are using GNU date ***
[error] Check failed: Object not exists: failed to get metadata for 'o658d999fe5c00059': failed to read get_data reply: {"content":null,"type":"get_data_reply"} in "this->GetMetaData(id, meta, sync_remote)"
*** Aborted at 1744669433 (unix time) try "date -d @1744669433" if you are using GNU date ***
PC: @                0x0 (unknown)
*** SIGSEGV (@0x68) received by PID 288 (TID 0x7fe90973f640) from PID 104; stack trace: ***
PC: @                0x0 (unknown)
[error] Check failed: Object not exists: failed to get metadata for 'o658d999fe5c00059': failed to read get_data reply: {"content":null,"type":"get_data_reply"} in "this->GetMetaData(id, meta, sync_remote)"
*** Aborted at 1744669433 (unix time) try "date -d @1744669433" if you are using GNU date ***
*** SIGSEGV (@0x68) received by PID 251 (TID 0x7fc73a3e1640) from PID 104; stack trace: ***
PC: @                0x0 (unknown)
*** SIGSEGV (@0x68) received by PID 313 (TID 0x7f36c9091640) from PID 104; stack trace: ***
    @     0x7fe9188b4046 (unknown)
    @     0x7fc749909046 (unknown)
PC: @                0x0 (unknown)
*** SIGSEGV (@0x68) received by PID 221 (TID 0x7f1877fff640) from PID 104; stack trace: ***
    @     0x7fe915fec520 (unknown)
    @     0x7f36d8bfd046 (unknown)
    @     0x7fe8f490ad96 std::__detail::_Map_base<>::at()
    @     0x7fe8f48c639e LoadGraph
    @     0x555b064b2f08 gs::GrapeInstance::loadGraph()
    @     0x555b064b8ad5 gs::GrapeInstance::OnReceive()
    @     0x7fc747041520 (unknown)
    @     0x7fc7400e7d96 std::__detail::_Map_base<>::at()
    @     0x555b0654b39a gs::Dispatcher::processCmd()
    @     0x7fc7400a339e LoadGraph
    @     0x555b0654d0ce gs::Dispatcher::subscriberLoop()
    @     0x55f1bf86cf08 gs::GrapeInstance::loadGraph()
    @     0x55f1bf872ad5 gs::GrapeInstance::OnReceive()
    @     0x55f1bf90539a gs::Dispatcher::processCmd()
    @     0x55f1bf9070ce gs::Dispatcher::subscriberLoop()
    @     0x7f36d6335520 (unknown)
    @     0x7f36bcd08d96 std::__detail::_Map_base<>::at()
    @     0x7f36bccc439e LoadGraph
    @     0x563b31d28f08 gs::GrapeInstance::loadGraph()
    @     0x7f18873af046 (unknown)
    @     0x563b31d2ead5 gs::GrapeInstance::OnReceive()
    @     0x563b31dc139a gs::Dispatcher::processCmd()
    @     0x563b31dc3b16 gs::Dispatcher::publisherLoop()
    @     0x7fe9162d1253 (unknown)
    @     0x7fc747326253 (unknown)
    @     0x7f1884ae7520 (unknown)
    @     0x7f187450cd96 std::__detail::_Map_base<>::at()
    @     0x7f36d661a253 (unknown)
    @     0x7f18744c839e LoadGraph
    @     0x559cb8b5bf08 gs::GrapeInstance::loadGraph()
    @     0x559cb8b61ad5 gs::GrapeInstance::OnReceive()
    @     0x559cb8bf439a gs::Dispatcher::processCmd()
    @     0x559cb8bf60ce gs::Dispatcher::subscriberLoop()
    @     0x7fe91603eac3 (unknown)
    @     0x7fc747093ac3 (unknown)
    @     0x7f36d6387ac3 (unknown)
    @     0x7fe9160d0850 (unknown)
    @     0x7f1884dcc253 (unknown)
    @     0x7fc747125850 (unknown)
    @     0x7f36d6419850 (unknown)
    @     0x7f1884b39ac3 (unknown)
    @     0x7f1884bcb850 (unknown)
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 2 with PID 251 on node gs-engine-ll-2 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

To Reproduce Steps to reproduce the behavior:

  1. Deploy graphscope chart with GAE enabled on k8s.
  2. Create a session graphscope.session(addr='xxx.xxx.xxx.xxx:xxxxx',enabled_engines='gae'.
  3. Load data into graph (g). Just doing g = sess.g(oid_type="string") is also enough.
  4. Note the vineyard_id using g.vineyard_id.
  5. Close the session sess.close().
  6. Create another session (to same cluster).
  7. Try to load graph back from vineyard using sess.load_from(vineyard.ObjectID(<object_id>). Expected behavior A clear and concise description of what you expected to happen.

Environment (please complete the following information):

  • GraphScope version: v0.31.0
  • OS: [e.g. macOS, Linux]: Linux
  • Version [e.g. 10.15]
  • Kubernetes Version [e.g., 1.19]

ajay-48213 avatar Apr 15 '25 03:04 ajay-48213

Thanks for opening your first issue here! Be sure to follow the issue template! And a maintainer will get back to you shortly! Please feel free to contact us on DingTalk, WeChat account(graphscope) or Slack. We are happy to answer your questions responsively.

welcome[bot] avatar Apr 15 '25 03:04 welcome[bot]

/cc @yecol @sighingnow, this issus/pr has had no activity for a long time, please help to review the status and assign people to work on it.

github-actions[bot] avatar May 12 '25 00:05 github-actions[bot]