openeo-python-client icon indicating copy to clipboard operation
openeo-python-client copied to clipboard

MultiBackendJobManager.run_jobs() doesn't add new jobs to existing job_tracker

Open VincentVerelst opened this issue 10 months ago • 2 comments

The MultiBackendJobManager.run_jobs() method takes as input a df, which is a DataFrame containing information about all the jobs to run and an output_file, which contains the path to a csv file to track the status of all the jobs. If the output_file already exists, however, the run_jobs() method will ignore the df input and continue from the existing jobs in the output_file, as seen in the code below:

output_file = Path(output_file)
 if output_file.exists() and output_file.is_file():
      # Resume from existing CSV
      _log.info(f"Resuming `run_jobs` from {output_file.absolute()}")
      df = pd.read_csv(output_file)
      status_histogram = df.groupby("status").size().to_dict()
      _log.info(f"Status histogram: {status_histogram}")

This makes it so that once a MultiBackendJobManager is run a second time, with the same output_file, it's not possible to add new jobs. Is is possible that when output_file already exists, run_jobs() creates the union of the input df and existing output_file? Or is there a good reason not to?

VincentVerelst avatar Apr 16 '24 13:04 VincentVerelst