graphstorm icon indicating copy to clipboard operation
graphstorm copied to clipboard

End2end tests and some examples have strong environment and data assumptions

Open thvasilo opened this issue 1 year ago • 2 comments

For example, if we try to run https://github.com/awslabs/graphstorm/tree/main/training_scripts/gsgnn_mt on the GraphStorm image, we'd run into the error

python3 tests/end2end-tests/data_gen/process_movielens.py
Traceback (most recent call last):
  File "/root/graphstorm/tests/end2end-tests/data_gen/process_movielens.py", line 29, in <module>
    user = pandas.read_csv('/data/ml-100k/u.user', delimiter='|', header=None,
  File "/opt/gs-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/opt/gs-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 620, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/opt/gs-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1620, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "/opt/gs-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1880, in _make_engine
    self.handles = get_handle(
  File "/opt/gs-venv/lib/python3.9/site-packages/pandas/io/common.py", line 873, in get_handle
    handle = open(
FileNotFoundError: [Errno 2] No such file or directory: '/data/ml-100k/u.user'

Similarly, to be able to run end2end tests, we'd start by trying to run https://github.com/awslabs/graphstorm/blob/main/tests/end2end-tests/create_data.sh

However, the assumptions that script starts with

mkdir -p /data
cd /data
cp -R /storage/ml-100k /data

Which 1) assumes root permissions by calling mkdir -p /data, which is fine on the GraphStorm image at least, although should be avoided, and that there exists a directory /storage/ml-100k.

The above make it currently not possible for someone to run the end2end tests after cloning the repo in their local env. We should make our scripts agnostic of such paths and files, and allow the end2end tests to run on fresh clones of the repo, and fix any examples that try to use scripts with such assumptions.

thvasilo avatar Aug 09 '24 18:08 thvasilo

Is this resolved?

classicsong avatar Jan 15 '25 19:01 classicsong

@zhjwy9343 fixed some examples, I believe the tests still have the assumptions mentioned.

thvasilo avatar Jan 17 '25 19:01 thvasilo