openeo-python-client
openeo-python-client copied to clipboard
MultiBackendJobManager.run_jobs() doesn't add new jobs to existing job_tracker
The MultiBackendJobManager.run_jobs()
method takes as input a df, which is a DataFrame containing information about all the jobs to run and an output_file, which contains the path to a csv file to track the status of all the jobs.
If the output_file already exists, however, the run_jobs() method will ignore the df input and continue from the existing jobs in the output_file, as seen in the code below:
output_file = Path(output_file)
if output_file.exists() and output_file.is_file():
# Resume from existing CSV
_log.info(f"Resuming `run_jobs` from {output_file.absolute()}")
df = pd.read_csv(output_file)
status_histogram = df.groupby("status").size().to_dict()
_log.info(f"Status histogram: {status_histogram}")
This makes it so that once a MultiBackendJobManager is run a second time, with the same output_file, it's not possible to add new jobs. Is is possible that when output_file already exists, run_jobs() creates the union of the input df and existing output_file? Or is there a good reason not to?