spark-rapids Add logging to Integration test runs

This PR adds file logging to parallel integration runs i.e. TEST_PARALLEL = 1 is unaffected.

Today, the integration logs only write output to the console making it difficult to go back and check test failures without re-running the failing test suite. This is useful when debugging integration test failures in a dev environment. This can also be used in auditing and identifying all the execs/expressions that are currently being tested by our integration tests.

After running integration tests using xdist, the logs will be generated under PROJECT_ROOT/integration_tests/target/run_dir-xxx directory

Mar 28 '24 18:03 razajafri

Please add a better description stating exactly what this adds, how a user would use it, and why its being added.

What are Working logs?

Apr 01 '24 13:04 tgravescs

Please add a better description stating exactly what this adds, how a user would use it, and why its being added.

What are Working logs?

Thanks for reviewing, I have updated the PR description. PTAL

Apr 01 '24 14:04 razajafri

@tgravescs @gerashegalov PTAL

Apr 25 '24 19:04 razajafri

build

Apr 26 '24 15:04 razajafri

so this one is just for the pytest driver/framework to write to the logs?

Yes, regardless of parallelism. This logger is used for logging pytest request.node.nodeid

Above we modify the spark driver_opts so when the spark session is created it uses the log4j.properties file specified there, correct?

Only when TEST_PARALLEL > 1 otherwise we use log4j properties defined in run_pyspark_from_build.sh

I will add more comments on this line to make this clear.

Apr 26 '24 16:04 razajafri

@tgravescs I have addressed your concern about only enforcing this in local mode. PTAL

May 01 '24 22:05 razajafri

mostly looks good, I had one question I'm waiting on

May 02 '24 14:05 tgravescs

build

May 02 '24 17:05 razajafri

@gerashegalov PTAL

May 03 '24 01:05 razajafri

build

May 03 '24 01:05 razajafri

build

May 03 '24 02:05 razajafri