Renate icon indicating copy to clipboard operation
Renate copied to clipboard

Make experiment evaluation a separate process

Open prabhuteja12 opened this issue 1 year ago • 0 comments

The current experimentation code in benchmarking runs evaluation in the same thread as the (subsequent) trainings. This is a problem when using DDP as the first evaluation (Line 295) creates several processes (as many GPUs) and each of them try to spawn training processes causing a problem with DDP ports clashing.

Describe the solution you'd like The evaluation/testing should run in a separate process a la run_training_job and this issue wouldn't occur.

**Additional info: See the discussion on Lightning forum: https://github.com/Lightning-AI/lightning/issues/2537

prabhuteja12 avatar May 08 '23 11:05 prabhuteja12