darktable icon indicating copy to clipboard operation
darktable copied to clipboard

Crash when comparing snapshots of diffuse-or-sharpen settings

Open kofa73 opened this issue 1 year ago • 13 comments

Describe the bug

I was comparing a snapshot with an active history item when darktable crashed. Backtrace attached.

Steps to reproduce

Not strictly reproducible, previous comparisons worked fine.

Expected behavior

not crash

Logfile | Screenshot | Screencast

No response

Commit

No response

Where did you obtain darktable from?

self compiled

darktable version

4.7.0+1091~gbf7747b1d3

What OS are you using?

Linux

What is the version of your OS?

Ubuntu 23.10

Describe your system?

No response

Are you using OpenCL GPU in darktable?

Yes

If yes, what is the GPU card and driver?

NVidia 1060 with 6 GB, driver 535.171.04

Please provide additional context if applicable. You can attach files too, but might need to rename to .txt or .zip

darktable_bt_UEFHN2.txt

kofa73 avatar May 05 '24 10:05 kofa73

Buffer overrun in dt_iop_image_scaled_copy (src/common/imagebuf.c) Determining the root cause may be tricky given that it is intermittent.

ralfbrown avatar May 05 '24 15:05 ralfbrown

As we use this for masks there is only one channel and we might have a non-16bytes-aligned buffer size. That could cause an overrun i think.

EDIT: that's just wrong after a second thought as the dt_alloc_aligned variants make sure.

jenshannoschwalm avatar May 10 '24 05:05 jenshannoschwalm

would be interested what modules/blend is involved here. Could you share raw and xmp?

jenshannoschwalm avatar May 10 '24 08:05 jenshannoschwalm

I'll see if I can find it. I'm sorry, I wasn't thorough enough when I reported the issue.

kofa73 avatar May 10 '24 11:05 kofa73

One thing that should help figure out the cause is adding the line

memcpy_parallel_threshold=1000000000

to darktablerc. That will result in the non-paralellized branch being used, which in turn will give a backtrace showing who called the function (assuming you still get crashes). dt will be a litle bit slower but it might not even be noitceable.

ralfbrown avatar May 10 '24 12:05 ralfbrown

@kofa73 , if you cant reproduce or cant find the xmp, can you remember if raster masks were involved?

jenshannoschwalm avatar May 10 '24 15:05 jenshannoschwalm

Yes, almost surely they were in use; my standard diffuse-or-sharpen style now contains a details-based mask on the 1st instance, and a raster mask following it on the 2nd one.

kofa73 avatar May 10 '24 15:05 kofa73

I think one of the crashes came from this one: https://tech.kovacs-telekes.org/files/2024-05-10-dt-issue-16733/

kofa73 avatar May 10 '24 16:05 kofa73

Ok, I spotted a possible problem with rasters, would be good to check on something reproducible or at least some log from you.

jenshannoschwalm avatar May 10 '24 16:05 jenshannoschwalm

OK, I'll be sure to provide an adequate log if it crashes again. What logs would you need, besides the backtrace? Normally, I run with -d perf -d tiling -d opencl -d common; shall I add something more? I'll add the memcpy_parallel_threshold=1000000000 line to my config. I normally build using --build-type Release, do you need something different? I'm not sure how much editing I'll do today and this weekend; I'll try to find some time.

kofa73 avatar May 10 '24 16:05 kofa73

OK, that didn't take long. console.log darktable_bt_AAMBN2.txt

To reproduce, I took a snapshot of step #26 'exposure', then jumped to the top of the stack, turned on the comparison, zoomed in, enabled 'toggle high quality processing', and started dragging the preview rectangle at random. I'll try to see if there is a more deterministic sequence of actions.

kofa73 avatar May 10 '24 16:05 kofa73

Unfortunately the backtrace is empty, but I found something in the log - the raster mask is being generated at 5122x3662, while the image it's being applied to is 5488x3664. The mask should be going through the distort transformation to match it up to how lens correction is moving the pixels around, but maybe it isn't (or at least not properly).

ralfbrown avatar May 10 '24 19:05 ralfbrown

The mask should be going through the distort transformation ...

Sure.

  1. There might be something wrong with the general code handling this
  2. A module might not have the required distorting code or might handle that badly
  3. We might test all this in a better way with #16764

jenshannoschwalm avatar May 11 '24 07:05 jenshannoschwalm