GraphScope
GraphScope copied to clipboard
bug(analytical): `context.output` raise error when result is huge
Describe the bug
User run flash.harmonic_centrality
algorithm on a large string id dataset (1.2 billion vertices, 3.2 billion edges), and dumps the result with contest.output
. Got error:
To Reproduce
#!/usr/bin/python3
import os
import graphscope
from graphscope.client.session import get_default_session
from graphscope.framework.loader import Loader
graphscope.set_option(show_log=True)
def load_graph(sess, path):
graph = sess.g(oid_type="string", directed=True, generate_eid=False, retain_oid=False)
edge = Loader(path, header_row=False, delimiter=",", directed=True)
graph = graph.add_edges(edge, label='link', src_label="domain", dst_label="domain")
return graph
def hamonic_centrality(graph):
context = graphscope.flash.harmonic_centrality(graph)
return context
if __name__ == "__main__":
sess = graphscope.session(cluster_type="hosts", num_workers=1)
g = load_graph(sess, "/xxx//domain_graph.txt")
ctx = hamonic_centrality(g)
ctx.output("/xxxx/harmonic_result.txt", {"id": "v.id", "centrality": "r"})
Environment (please complete the following information):
- GraphScope version: v0.26.0 through
pip install graphscope
- OS: Linux
- Version :Ubuntu 20.04
Fixed in upstream: https://github.com/apache/arrow/pull/40271
@acezen Please verify the fix and close this issue if possible.
/cc @sighingnow, this issus/pr has had no activity for for a long time, could you folks help to review the status ? To suppress further notifications,
- for issues,
- if it is waiting for further response from the reporter/author, please help to add the label
requires-further-info
, - if you have already started working on it, please add the label
work-in-progress
to the issue, - if this issue requires further designing discussion and not in current plan, or won't be fixed, please add the label
requires-further-discussion
orwontfix
to the issue,
- if it is waiting for further response from the reporter/author, please help to add the label
- for pull requests,
- if you are still working on it and it is not ready for reviewing, please convert this pull request as draft PR,
- if you have decided to hold this development on, please add the
requires-further-discussion
label to the pull request. Thanks!
Closing as fixed in upstream: https://github.com/apache/arrow/pull/40271