End2end tests and some examples have strong environment and data assumptions
For example, if we try to run https://github.com/awslabs/graphstorm/tree/main/training_scripts/gsgnn_mt on the GraphStorm image, we'd run into the error
python3 tests/end2end-tests/data_gen/process_movielens.py
Traceback (most recent call last):
File "/root/graphstorm/tests/end2end-tests/data_gen/process_movielens.py", line 29, in <module>
user = pandas.read_csv('/data/ml-100k/u.user', delimiter='|', header=None,
File "/opt/gs-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
return _read(filepath_or_buffer, kwds)
File "/opt/gs-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 620, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/opt/gs-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1620, in __init__
self._engine = self._make_engine(f, self.engine)
File "/opt/gs-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1880, in _make_engine
self.handles = get_handle(
File "/opt/gs-venv/lib/python3.9/site-packages/pandas/io/common.py", line 873, in get_handle
handle = open(
FileNotFoundError: [Errno 2] No such file or directory: '/data/ml-100k/u.user'
Similarly, to be able to run end2end tests, we'd start by trying to run https://github.com/awslabs/graphstorm/blob/main/tests/end2end-tests/create_data.sh
However, the assumptions that script starts with
mkdir -p /data
cd /data
cp -R /storage/ml-100k /data
Which 1) assumes root permissions by calling mkdir -p /data, which is fine on the GraphStorm image at least, although should be avoided, and that there exists a directory /storage/ml-100k.
The above make it currently not possible for someone to run the end2end tests after cloning the repo in their local env. We should make our scripts agnostic of such paths and files, and allow the end2end tests to run on fresh clones of the repo, and fix any examples that try to use scripts with such assumptions.
Is this resolved?
@zhjwy9343 fixed some examples, I believe the tests still have the assumptions mentioned.