GraphScope icon indicating copy to clipboard operation
GraphScope copied to clipboard

Support wildcards when loading from files

Open siyuan0322 opened this issue 1 year ago • 14 comments

For example:

graph.add_edges('/data/person/*.csv')

We support wildcards in oss and hdfs, (by fsspec) but not for local files.

siyuan0322 avatar Jun 27 '23 04:06 siyuan0322

In addtion, it requires well documented.

yecol avatar Jun 27 '23 06:06 yecol

Hi, I'd like to attempt to solve this issue. Could it be assigned to me. Thank you.

ywh555hhh avatar Apr 21 '24 05:04 ywh555hhh

Hi, if u r interested in this issue, then u r welcome to open a PR to solve it, discussions/consultations abt how it should be address is also welcome. U may need a understanding of how loading process works in here, and setup a developing environment of this project.

siyuan0322 avatar Apr 23 '24 05:04 siyuan0322

Thank you for your response. I have read part of the project documentation and have set up the development environment using the dev docker container provided by the community. I have a few questions:

    1. Should I open a PR now or at a later stage?
    1. Where in the documentation can I learn about the loading process? I noticed that the file python/graphscope/framework/loader.py seems to be responsible for this task.
    1. Could you explain why this issue is tagged with component:gae and component:vineyard?

Looking forward to your guidance.

ywh555hhh avatar Apr 24 '24 05:04 ywh555hhh

i. It's better to open a PR after a workable version. ii. I'm afraid u have to go through the source code. iii. Cuz the loading process is in the c++ code, which calls functions within library vineyard, which u can trace through.

This is not a hard task, but it contains a rather long call chain.

siyuan0322 avatar Apr 24 '24 05:04 siyuan0322

Thank you for your detailed explanation, I will try to understand and complete it

ywh555hhh avatar Apr 24 '24 06:04 ywh555hhh

Hello,

I noticed in the developer's guide that I can use the make minitest(unitest) command for testing. However, I didn't find this command in the Makefile. I have noticed in other issues that the developer's guide may be a bit outdated. Could you please guide me on how to handle testing for this issue?

Regarding the solution to the issue, I have found that the load_from method of the Graph class is responsible for file loading. Could you please confirm if my understanding is correct?

For the specific solution, I plan to use the glob library to achieve the goal.

Looking forward to your guidance.

ywh555hhh avatar May 06 '24 04:05 ywh555hhh

  • You could add a test in test_create_graph.py, and refer to this python test workflow to test.

  • load_from is for gathering necessary informations, such as label, property, file location, etc. The read file process actually is in arrow_fragment_loader in v6d.

siyuan0322 avatar May 06 '24 05:05 siyuan0322

Hello,

Firstly, I want to express my respect for your time. I understand that you must be busy, so I greatly appreciate you taking the time to assist me.

I've encountered some issues while trying to run the test_create_graph.py test. The command I used is:

python3 -m pytest -d --tx popen//python=python3 \
                    -s -v \
                    --cov=graphscope --cov-config=python/.coveragerc --cov-report=xml --cov-report=term \
                    /workspaces/GraphScope/python/graphscope/tests/unittest/test_create_graph.py

Running this test took me about 4 hours, and about 80% of the test points reported errors. From the output, the problem seems to be related to grpc. The specific error type is grpc._channel._InactiveRpcError, the status code is StatusCode.ABORTED, and the error details are "Launch analytical engine failed:", indicating that an error occurred when launching the analytical engine.

This error occurs during the initialization of graphscope.client.session, when it tries to create an analytical instance via a gRPC connection. The exception is thrown when calling the create_analytical_instance method in graphscope.client.rpc.

I wanted to ask if this is a normal situation? Or could it be that there are network issues with my Linux server?

I greatly appreciate any assistance you can provide, and I respect your time, so if you need more information to help solve this problem, I will provide it as soon as possible.

Thank you again for your help.

ywh555hhh avatar May 15 '24 05:05 ywh555hhh

It's the program can't find or can't launch the analytical_engine (a.k.a. grape_engine). Probably the installation was not successful.

siyuan0322 avatar May 15 '24 06:05 siyuan0322

Thanks for your reply. I'll try to reinstall it

ywh555hhh avatar May 15 '24 06:05 ywh555hhh

Hello,

Sorry to bother you at night've been following the suggestions provided in this issue thread and have tried reinstalling Gs on my machine. Unfortunately, I'm still encountering issues when trying to task it.I attached the raw output below Could you please provide further guidance on how to resolve this? Any help would be greatly appreciated.

Thank you in advance

info.txt

ywh555hhh avatar May 16 '24 12:05 ywh555hhh

It seems the previous message was not accepted

ywh555hhh avatar May 21 '24 05:05 ywh555hhh

You might want to try using devcontainer to get rid of the environment issue. We have a devcontainer.json provided.

siyuan0322 avatar May 21 '24 05:05 siyuan0322