grass icon indicating copy to clipboard operation
grass copied to clipboard

lib/gis: solve write out conflicts of G_percent/G_progress when multithreading

Open cyliang368 opened this issue 5 months ago • 13 comments

This PR should solve #5776.

If we want to solve writing conflicts to the same place (stderr), it must be serial.

cyliang368 avatar Jul 26 '25 18:07 cyliang368

Thanks! There is probably no better solution, but I think this still needs some benchmark whether this has any impact on the speed.

petrasovaa avatar Jul 27 '25 08:07 petrasovaa

This should not be merged, as:

  • it has a serious speed penalty
  • it does not solve #5776

marisn avatar Jul 27 '25 08:07 marisn

Thanks! There is probably no better solution, but I think this still needs some benchmark whether this has any impact on the speed.

This should not be merged, as:

  • it has a serious speed penalty

It is a very small part, so I believe its impact could be negligible.

I simply ran v.surf.rst and r.univar a few times manually, and I could not tell whether the difference was made by this change or some uncertainty from OS.

I would like to get some benchmarks on r.mapcalc after #5742 is merged.


@marisn Can you elaborate on why you think it does not solve #5776?

cyliang368 avatar Jul 27 '25 19:07 cyliang368

I ran the benchmark of r.mapcalc on main and this branch. This modification does not dramatically slow down the computation. This modification sometimes yields less speedup, but more often yields more speedup. I attached some examples and simple scripts below, and the IPython notebook in the zip file stores all the comparison figures.

benchmark.zip

@petrasovaa could you test this on your end to double-check?

image image image image

cyliang368 avatar Aug 03 '25 18:08 cyliang368

I agree with @marisn because it can significantly slow down parallel performance. The above benchmark testing is just one of many possible cases and higher speedup is, I believe, just a noise because that's not possible theoretically or at least cannot easily be explained when critical sections are introduced.

HuidaeCho avatar Aug 21 '25 14:08 HuidaeCho

@cyliang368 Could you test it again without calling G_percent() at all and compare the three?

HuidaeCho avatar Aug 21 '25 14:08 HuidaeCho

I have doubts about the caller code (tool level code) being correct even with this change. If you simply show the percentage from each thread, it will not reflect the overall progress. That's possible to show only when the threads sync their work. If they don't, better not to show anything, or resort to some heuristic like showing progress from the first thread only.

wenzeslaus avatar Aug 27 '25 21:08 wenzeslaus

I have doubts about the caller code (tool level code) being correct even with this change. If you simply show the percentage from each thread, it will not reflect the overall progress. That's possible to show only when the threads sync their work. If they don't, better not to show anything, or resort to some heuristic like showing progress from the first thread only.

wenzeslaus avatar Aug 27 '25 21:08 wenzeslaus

You are correct. When multiple threads call G_percent, the output will not be correct as each thread will push % counter to its progress position (thread work position in all work queue) instead of overall progress (aggregate of total work done). Removing G_percent from multithreading code sounds like radical idea as then we lose all progress reporting as more code is ported to use threads. Better approach would be to create G_percent_mt() that is thread safe and can be called instead.

marisn avatar Aug 28 '25 07:08 marisn

Better, but an if condition checking for the first thread would be easier for 8.5, no?

wenzeslaus avatar Aug 28 '25 12:08 wenzeslaus

I have doubts about the caller code (tool level code) being correct even with this change. If you simply show the percentage from each thread, it will not reflect the overall progress. That's possible to show only when the threads sync their work.

@wenzeslaus you are right. However, I will say using 1 thread can make things worse. Let's say we specify nprocs=50 and the loop is running i=0:100. OS and OpenMP do not guarantee which thread will take which i, and which thread will finish first.

In the first batch (i=0:50), the first thread is assigned to run i=24, but it actually is the 37th thread that finishes the task in the loop. It will still print out 24%, which is not the correct progress (37%).

In the second batch (i=50:100), the first thread is assigned to run i=72. Regardless of which thread finishes the loop in the end, we can only see 72% printed out by the first thread, which is incorrect.

Unless @marisn has a better idea, I think 1 thread makes things worse.


This problem is not from G_percent itself. The problem is caused by how we provide a counter to G_percent. Currently, most of the modules use the counter (e.g. i or j in for statements) in G_percent. Again, OS and OpenMP do not guarantee which thread will take which i, and which thread will finish first. If we want to fix it, we should have a new counter in the loop in every parallelized module. This counter will be updated when a thread finishes the loop block, and we make G_percent take this value, which reflects the correct progress.


If we really care about computational performance and believe locks are a bottleneck, I would say it might be better not to print out everything. The fact is that fprintf itself contains a lock, so a lock was already there before this PR.

However, I still believe fprintf and locking won't significantly affect the runtime, as the computational part occupies most of it. Further profiling may be needed to verify this.

cyliang368 avatar Aug 28 '25 15:08 cyliang368

Technically, as being C11, we can make use of _Atomic for the counter, if that could be part of a solution.

nilason avatar Aug 28 '25 18:08 nilason

If you are running it as part of a Python script or in a notebook, you will not see any of that cautiously computed percentage. What about just showing a rotating thingy from the first thread to show the process is still alive?

wenzeslaus avatar Aug 28 '25 22:08 wenzeslaus