pyuvsim icon indicating copy to clipboard operation
pyuvsim copied to clipboard

progress bar takes forever to load

Open mkolopanis opened this issue 4 years ago • 22 comments

The current implementation of the progress bar does not even appear on a terminal screen until after the first task has been completed by rank 0. It would be nice to at least have it initialize on the terminal so that users have visual feedback that the simulation has started.

mkolopanis avatar Feb 17 '21 19:02 mkolopanis

also i know it just prints steps, but an actual bar could be nice too :thinking:

mkolopanis avatar Feb 17 '21 19:02 mkolopanis

I don't know if it's applicable to the way progress is currently being tracked, but I have used progressbar2 in the past, and it's pretty easy to set up and use.

plaplant avatar Feb 17 '21 20:02 plaplant

Originally, we had just progressbar. Then that was ugly when running in batch mode, so I wrote progsteps. Then it became complicated to maintain both, so we dropped progressbar.

The big issue with it right now is that it only updates each time the rank 0 process completes a step, so if the job is unbalanced you can end up with a progress status stuck at ~90% while the other ranks finish. It's fixable, but a little tricky with MPI.

aelanman avatar Feb 17 '21 20:02 aelanman

Doesn't this feel like an issue someone has solved before? We want a pretty progress bar that is well compatible with a batch submission system and mpi

mkolopanis avatar Feb 17 '21 21:02 mkolopanis

I'm not sure what a pretty progress bar looks like when you're just printing out progress updates to a log file.... I do think the more pressing concern is how often it updates. progressbar also only updates when triggered to, and right now that only happens when the rank 0 loop iterates.

One option could be to have the progress bar running in a subprocess or thread, though that sometimes doesn't play well with MPI.

aelanman avatar Feb 17 '21 21:02 aelanman

It could get its own process too, which communicates with all other PUs to track the total number of tasks completed.

mkolopanis avatar Feb 17 '21 21:02 mkolopanis

Currently, the progress indicator uses an MPI-enabled counter. Other processes increment the counter using remote memory access. It's a passive process, though, so the progsteps instance needs to periodically check on the counter to know if it needs to print an update.

aelanman avatar Feb 17 '21 21:02 aelanman

yeah so this idea would offload everything progstep to it's on PU. then it can print at fixed amount of time or fixed percentages. still doesn't solve the problem where you wait on hanging calculations. not a perfect idea obviously but none are.

mkolopanis avatar Feb 17 '21 21:02 mkolopanis

I still think there are two simpler options:

  1. After the task loop on rank 0, start a new loop that sleeps for a few seconds at a time and then makes progsteps update, which will print if the counter has incremented by the right amount.
  2. Run a thread with a similar loop. This can be added directly to the progsteps class.

The second is basically your idea, but avoids sacrificing a PU to run the progress indicator.

aelanman avatar Feb 17 '21 21:02 aelanman

yeah I like the idea of not sacrificing a PU. Either way is probably reasonable.

mkolopanis avatar Feb 17 '21 21:02 mkolopanis

@mkolopanis

This branch addresses the issue: https://github.com/RadioAstronomySoftwareGroup/pyuvsim/tree/progsteps_upgrade

It does two things:

  1. Adds a "ParallelFlag" class, which behaves like a boolean array of length Npus. Each rank can set its corresponding entry to either true or false, and the root process can watch it. It's non-blocking, working via RMA.
  2. The progress steps are used in the loop as before, but after the main loop the root process will wait until all processes have finished their loops. While waiting, it will run "update" every second.

The overall effect is that the progress steps keep updating even after the root process has finished its loop.

I haven't tested this with a full-scale simulation yet, but I've got the tests mostly passing (one timed out in python 3.7, which isn't a good sign...).

aelanman avatar Feb 19 '21 02:02 aelanman

Sorry... pushed the wrong button... this should stay open.

aelanman avatar Feb 19 '21 02:02 aelanman

I know you've been working on this update progbar thing too, I can't help but wonder if tqdm has a solution up their sleeves somewhere for interfacing with batch systems, or some combination of tqdm and the counter class to capture all the information from the MPI processes.

mkolopanis avatar Feb 22 '21 17:02 mkolopanis

There might be a solution in tqdm. Really, the Counter class does all the hard work of keeping track of progress across the whole MPI job. The progsteps is just a simple progress indicator that the Counter, modeled off of progbar.

It looks tqdm needs an iterable. It should be pretty simple to make turn Counter into an iterable. I'm not sure how tqdm knows when it's in batch mode, though.

aelanman avatar Feb 22 '21 23:02 aelanman

also tqdm does not necessarily need an iterable, here's a hack i put together in the current version of the code. This does that thing where it gives up on rank 0 as a compute node (and I have propagated through in the local_task_iter to account for that).

    Ntasks_tot = comm.reduce(Ntasks_tot, op=mpi.MPI.MAX, root=0)
    if rank == 0 and not quiet:
        print("Tasks: ", Ntasks_tot, flush=True)
        #pbar = simutils.progsteps(maxval=Ntasks_tot)
        pbar = tqdm.tqdm(total=Ntasks_tot, unit="UVTask")

    engine = UVEngine()
    count = mpi.Counter()
    size_complex = np.ones(1, dtype=complex).nbytes
    data_array_shape = (Nbls * Ntimes, 1, Nfreqs, 4)
    uvdata_indices = []
    #comm.Barrier()
    if rank != 0:
        rank_logfile = f"iter_{rank}.out"
        logf = open(rank_logfile, 'w')
        mpi.MPI.Wtime()
        for task in local_task_iter:
            engine.set_task(task)
            vis = engine.make_visibility()

            blti, spw, freq_ind = task.uvdata_index

            uvdata_indices.append(task.uvdata_index)

            flat_ind = np.ravel_multi_index(
                (blti, spw, freq_ind, 0), data_array_shape
            )
            offset = flat_ind * size_complex

            vis_data.Lock(0)
            vis_data.Accumulate(vis, 0, target=offset, op=mpi.MPI.SUM)
            vis_data.Unlock(0)
            dt = mpi.MPI.Wtime()
            logf.write(f"{dt}\n")

            curval = count.current_value()
            cval = count.next() - curval
            if rank == 0 and not quiet:
                pbar.update(cval)
            #print(rank, Time.now())
        logf.close()
    request = comm.Ibarrier()
    last_val = count.current_value()
    while not request.Test():
        curval = count.current_value()
        if rank == 0 and not quiet and curval != last_val:
            cval = curval  - last_val
            pbar.update(cval)
            last_val = curval

    count.free()
    if rank == 0 and not quiet:
        #pbar.finish()
        pbar.close()

mkolopanis avatar Feb 25 '21 21:02 mkolopanis

@mkolopanis Looks nice! How did it do?

aelanman avatar Feb 25 '21 23:02 aelanman

seemed nice enough, not sure how it would go over on a batch job though.

-> % /home/mkolopanis/src/anaconda/envs/py38_sim/bin/mpiexec -n 15 /home/mkolopanis/src/anaconda/envs/py38_sim/bin/python  /home/mkolopanis/src/anaconda/envs/py38_sim/bin/run_param_pyuvsim.py /data3/MWA/MWA_simpleds/simulation/obsparam_time_test.yaml

Sky Model setup took 0.31279799999996527 min
Nbls: 9
Ntimes: 39
Nfreqs: 256
Nsrcs: 8145
Tasks:  89856.0
100%|██████████| 89856/89856.0 [11:58<00:00, 125.14UVTask/s]Calculations Complete.
Run uvdata uvsim took 12.02200641666666 min
Outfile path:  /data3/MWA/MWA_simpleds/simulation/results/MWA-II_uvbeam_time_test_3.uvh5
Data Writing took 0.0008300999999377723 min

mkolopanis avatar Feb 25 '21 23:02 mkolopanis

One (potentially minor) sticking point about letting rank 0 not do any computations is that it will make things a little weird for tests. Most of the tests run in MPI's "serial mode", where it's just a single process.

aelanman avatar Feb 25 '21 23:02 aelanman

yeah this has crossed my mind. But it's a possible path.

mkolopanis avatar Feb 25 '21 23:02 mkolopanis

regarding your prog_steps upgrade branch. I think it would make more sense to implement the non-blocking barrier approach that I have than make that parallel flag array and look for it to fill. Only really because the non-blocking barrier is literally the tool built into MPI you're trying to re-create.

mkolopanis avatar Feb 26 '21 15:02 mkolopanis

Of course! I wasn't aware of the nonblocking barrier (kind of an oxymoron). That is certainly a cleaner solution.

Were you able to see if the pattern I saw happens for larger jobs on a non-oversubscribed setup? That is, with the root process running through the loop before the others start, but only if the root is rank 0?

aelanman avatar Feb 26 '21 16:02 aelanman

I still need to investigate that. We are pretty heavily subscribed at the moment and I had some sims running this week. I am very curious about this though.

mkolopanis avatar Feb 26 '21 16:02 mkolopanis