datatrove adding an automatic log file `tail` under slurm executor

adding an automatic log file `tail` under slurm executor

Open stas00 opened this issue 7 months ago • 1 comments

When using a local executor the running logs appear right away, in the console it was launched from. But when using slurm one has to fish for the log files.

This can be made easier by automatically printing:

print(f"tail -F {logging_dir}/slurm_logs/{first_slurm_job_id}_0.out")

first_slurm_job_id coming from:

2024-07-10 01:38:05.605 | INFO     | datatrove.executor.slurm:launch_job:280 -
 Slurm job launched successfully with (last) id=109019.

though we want the first, not the last one here.

even fancier would be to run the tail on behalf of the user in the launcher - this way the local and slurm launching experiences will be identical.

But even printing the command to copy-n-paste would already be faster than manual fishing for the log file.

if this doesn't resonate as a feature is it possible to make run() return some attributes? e.g. the first slurm job id - and then the user can code this feature easily themselves.

Thank you!

reading the code I see launch_slurm_job returns some job id and it's then set into run.job_id but this would only be correct if tasks<1000, correct? otherwise it'll return the last job array and not the first one (since your log says ... (last) id=)?

Jul 10 '24 20:07 stas00

datatrove datatrove copied to clipboard

adding an automatic log file `tail` under slurm executor

datatrove
datatrove copied to clipboard