elks icon indicating copy to clipboard operation
elks copied to clipboard

Paint development

Open Vutshi opened this issue 1 year ago • 37 comments

This is continuation of Paint discussion started in #2269

Notably, Windows 3.0 in MartyPC performs better in this regard — the cursor is noticeably more responsive, even at 60Hz. A more faithful mouse implementation in the emulator should help, and it will be done eventually. At the same time, this might hint at some suboptimal behavior in ELKS itself, which could be worth exploring and optimizing.

The fun part of this little adventure in computer archaeology is being able to compare MS Paintbrush with ELKS Paint side by side — and I have to say, the former looks pretty pathetic in comparison. :)

Line drawing

https://github.com/user-attachments/assets/cf1a130c-f7e4-4d9f-949d-88ef715edfb3

Circle drawing

https://github.com/user-attachments/assets/e4a86836-1d48-4e25-9ba2-94dcd696674b

Originally posted by @Vutshi in #2269

Vutshi avatar Apr 21 '25 04:04 Vutshi

What does @dbalsom have to say about that?

Yesterday he fixed VGA emulation and at some point he will add 40 Hz mouse.

Vutshi avatar Apr 21 '25 04:04 Vutshi

It is possible that Windows 3.0 has a better internal mouse filter/accelerator. Have you tried Paint with USE_MOUSE_ACCEL both on and off?

I am not sure mouse acceleration has anything to do with it. USE_MOUSE_ACCEL was disabled on both the real hardware and in MartyPC.

Now Paint also runs on Book8088 VGA edition:

Image

P. S. The ‘F’ key test performed before connecting the mouse is interesting — on computer N1, the flood fill starts from the top of the screen!

Image

Vutshi avatar Apr 21 '25 05:04 Vutshi

I have to say, the former looks pretty pathetic in comparison.

Wow, our Paint is definitely superior in terms of line drawing, also using the circular disk instead of square pixel cursor!

It seems Paint is quicker and MSPaint on 8088 also, true?

I do like the thick circle draws and the ability to draw ellipses on MSPaint though.

I have to say, the former looks pretty pathetic in comparison.

Nice! I think the reason for the fill starting in differing spots might have something to do with the initial cursor location, as that's used for the fill start pixel.

ghaerr avatar Apr 21 '25 16:04 ghaerr

It seems Paint is quicker and MSPaint on 8088 also, true?

Yes, our drawing is much faster.

I do like the thick circle draws and the ability to draw ellipses on MSPaint though.

Ellipses are easy to implement, I wonder how to control it. Draw an ellipse by default and constrain to a circle when a control key is pressed?

Nice! I think the reason for the fill starting in differing spots might have something to do with the initial cursor location, as that's used for the fill start pixel.

Initial cursor location should be the same if mouse is untouched/unconnected. But for some reason I saw that initial position can jump between different setups EGA VGA QEMU MartyPC. Maybe it has something to do with the initial position being defined in events.c using hardcoded screen size parameters.

Vutshi avatar Apr 21 '25 16:04 Vutshi

@ghaerr, how can I fully remove remnants of OWC and C86? I’ve already reset the environment and run make clean, but I still can’t get it to fit on a 1.44MB floppy disk.

Vutshi avatar Apr 21 '25 17:04 Vutshi

Remove them from roots_template/bin.

ghaerr avatar Apr 21 '25 17:04 ghaerr

Remove them from roots_template/bin.

Thanks, that worked, especially after I also removed roots_template/usr :)

Vutshi avatar Apr 21 '25 20:04 Vutshi

Circles competition

Image

In the left corner of the ring – Bresenham's circles, and in the right corner, Arithmetical circles.

Arithmetical circles are defined as (r-1/2)^2 <= x^2 + y^2 <= (r+1/2)^2. Bresenham's circle is the thinnest possible circle approximation.

Vutshi avatar Apr 21 '25 20:04 Vutshi

Ellipses are easy to implement, I wonder how to control it. Draw an ellipse by default and constrain to a circle when a control key is pressed?

I would suggest the idea of using the top left corner (just like rectangle) as the anchor point for drawing a circle, if you want to also include the ability to draw ellipses. When the mouse forms a square from the top left, a circle, otherwise when its rectangular, an ellipse is drawn. Using a control key has the problem that it's not user-intuitive and also there's currently no way to get the keyboard shift information from the kernel, so a new API or "super-raw" keyboard termios mode would have to be implemented.

Thanks, that worked, especially after I also removed roots_template/usr :)

I've made a note to remove cpaint, opaint, and /usr during "make clean" from their respective directories.

Initial cursor location should be the same if mouse is untouched/unconnected.

event_open() sets them to SCREENWIDTH/2, SCREENHEIGHT/2, so that should be called after graphics_open(), I'll bet that's the problem.

In the left corner of the ring – Bresenham's circles, and in the right corner, Arithmetical circles.

Interesting - I like the left circles better from normal viewing distance, but the right circles look pretty good up close. I'd say I prefer the Bresenham circles - at least from the screenshot displayed on my laptop.

ghaerr avatar Apr 22 '25 03:04 ghaerr

More colors for the glory of ELKS Paint!

Image

Vutshi avatar Apr 22 '25 13:04 Vutshi

@ghaerr, I found some bug in flood fill, but I have no idea what is going on in this function. If you apply this diff:

diff --git a/elkscmd/gui/app.c b/elkscmd/gui/app.c
index 8cce8022..b02b5440 100644
--- a/elkscmd/gui/app.c
+++ b/elkscmd/gui/app.c
@@ -267,6 +267,20 @@ int main(int argc, char* argv[])
 
     // Highlight Brush Button
     R_HighlightActiveButton();
+    int y00 = 15;
+
+    for (int i = 0; i<16; i++) {
+        for (int j = i; j<16; j++) {
+            int x = y00 + I*30;
+            int y = y00 + j*30;
+            R_DrawCircle(x, y, 12, WHITE);
+            currentMainColor = i;
+            currentAltColor = j;
+        }
+    }
+    int x = 400;
+    int y = 100;
+    R_LineFloodFill(x, y, currentMainColor, readpixel(x, y));
 
     initcursor();

and run paint it will be stuck at this point:

Image

You can still keep drawing after attaching mouse but on exit it is stuck forever.

One can also initiate the flood fill manually, then it is stuck such that you cannot exit at all.

Vutshi avatar Apr 22 '25 14:04 Vutshi

and run paint it will be stuck at this point:

Try increasing FLOOD_FILL_STACK to a larger number. The flood fill function keeps a stack of pixels to check while it looks in each direction from the current pixel. It seems that the FF_Stack_Push function checks for stack overflow by not adding more items to the stack, but perhaps the algorithm still crashes when too many pixel locations are traversed with a 100 deep stack.

ghaerr avatar Apr 22 '25 15:04 ghaerr

Try increasing FLOOD_FILL_STACK to a larger number.

Increasing to 200 helped. What I couldn't achieve is some kind of graceful exit from the function. Paint might work ok after stopping flood fill prematurely but after exit from the program I see strange behaviour ranging from "stack underflow" to weird symbols or no text mode at all.

Vutshi avatar Apr 22 '25 21:04 Vutshi

Increasing to 200 helped.

I think that indicates that the internal stack that R_LineFloodFill keeps is part of, and probably all of, the problem.

after exit from the program I see strange behaviour ranging from "stack underflow" to weird symbols or no text mode at all.

Memory corruption for sure. Try increasing the stack to 400, or way more than that. I also think there's an off-by-one error in the stack push/pop function. Code the below, which fixes off-by-one and also will indicate when stack overflow is present:

static void FF_StackPush(transform2d_t stack[], int x, int y, int* top)
{
    if (*top < FLOOD_FILL_STACK - 1) { // <-- add "- 1" to fix off-by-one
        (*top)++;
        stack[*top].x = x;
        stack[*top].y = y;
    } else __dprintf("stack overflow at %d\n", *top); // <-- display stack overflow to QEMU console
}

Given that you have many circles, all with different x,y points along their circumference, I'm thinking that filling your "More colors for the glory of ELKS Paint!" is actually running the flood fill routine in worst case!

ghaerr avatar Apr 23 '25 02:04 ghaerr

I've spent a fair bit of time exploring various flood fill algorithms. The effort paid off: I achieved a 3× speed improvement.

Benchmark of the original R_LineFloodFill in MartyPC:

time paint
Real    3m6.030s

https://github.com/user-attachments/assets/78a50976-1cce-4dbb-9d46-91a55420669e

To improve performance, I experimented with SeedFill.c from Graphics Gems, which brought the time down to 1m22.03s. Another algorithm from the same source, using a smaller stack, gave 1m25.32s. Eventually, I settled on the Front_Fill, which employs a tricky stack structure and allows for filling entire segments with drawhline.

The new R_FrontFill:

time paint
Real    1m5.650s

it also looks pretty good:

https://github.com/user-attachments/assets/5c02e7a9-f8d9-43d8-967b-3f1721ade2c0

That said, MS Paint remains a flood fill beast, still about 2× faster than my current implementation:

https://github.com/user-attachments/assets/b2ac3d8f-7227-4ffb-8414-eb77e44eece0

At this point, I suspect that readpixel might be the current bottleneck.

Vutshi avatar May 02 '25 13:05 Vutshi

Very nice @Vutshi, you're showing a mastery of some cool graphics algorithms! That's a great improvement 3x the original. Is the FrontFill PDF paper available anywhere? The link you sent wants an account and purchase to read it.

I suspect that readpixel might be the current bottleneck.

Well, readpixel has to loop through each of four planes to get the pixel color, of which I'm not sure we can get around. But looking at vga-ia16.S::readpixel, it seems that lots of time is spent calculating the video buffer offset in BX from the passed X an Y parameters:

        mov     arg1+2(%bp), %ax// AX := y
        mov     arg1(%bp), %bx  // BX := x
        shl     $1, %ax         // AX := [y * 80] (= y*64 + y*16)
        shl     $1, %ax
        shl     $1, %ax
        shl     $1, %ax
        mov     %ax, %dx
        shl     $1, %dx
        shl     $1, %dx
        add     %dx, %ax

        mov     %bl, %cl        // save low order column bits
        shr     $1, %bx         // BX := [x / 8]
        shr     $1, %bx
        shr     $1, %bx
        add     %ax, %bx        // BX := [y * BYTESPERLN] + [x / 8]

That's a lot of code duplicated every readpixel, just to get the video offset in BX. (BTW, drawpixel does the same thing). If we could track the video buffer offset (address) separately in render.c, then both readpixel and drawpixel would get sped up by quite a bit. What we'd want would be something like readpixel_offset(unsigned offset) instead.

This would require some recoding of the flood fill algorithm to keep an offset instead of tracking X, Y separately. What do you think?

Another idea would be to use rewrite readpixel as a macro which would try to get rid of the CALL/PUSH BP/POP BP/RET etc, which would likely speed things up, which would basically insert the whole routine into every call to read/drawpixel. We could call that readpixel_fast etc so as to not have to inline everywhere.

I can write the macros or special functions depending on what you think would be best.

ghaerr avatar May 02 '25 16:05 ghaerr

Is the FrontFill PDF paper available anywhere?

I guess I can put it here.

burtsev1993 An efficient flood-filling algorithm.pdf

Vutshi avatar May 02 '25 17:05 Vutshi

What do you think?

I’ll open a PR later today—once that’s in, we can better assess whether readpixel is the actual performance bottleneck. I don’t have any profiling data; just a strong hint: switching from drawpixel to drawhline alone cut the runtime by around 25 seconds.

Vutshi avatar May 02 '25 17:05 Vutshi

Well, readpixel has to loop through each of four planes to get the pixel color, of which I'm not sure we can get around.

ChatGPT suggests to either keep a “shadow” buffer in system RAM (the fastest read) or to use the VGA hardware “latch” trick to get all four bit‑planes in one byte read. Probably MS Paint does the former. Do you know how to do the latter?

Vutshi avatar May 02 '25 17:05 Vutshi

ChatGPT suggests to either keep a “shadow” buffer in system RAM (the fastest read) or to use the VGA hardware “latch” trick to get all four bit‑planes in one byte read. Probably MS Paint does the former. Do you know how to do the latter?

you can't read the latches from the CPU, they're just for copying stuff around in VRAM.
a shadow buffer is sensible, but a simple optimization to cache 8 pixels at a time when you read a byte from each plane would mean only 1/8 calls to readpixel would be slow assuming they read lineaerly.

dbalsom avatar May 02 '25 17:05 dbalsom

a simple optimization to cache 8 pixels at a time when you read a byte from each plane would mean only 1/8 calls to readpixel would be slow assuming they read lineaerly.

This sounds good. I was thinking of reading a whole row of pixels, but 8 at a time is probably better.

Also I found that apparently there is a hardware feature for color comparison in VGA:

Read Mode 1 is used to perform comparisons against a reference color, specified by the Color Compare field. If a bit is set in the Color Don't Care field then the corresponding color plane is considered for by the comparison, otherwise it is ignored. Each bit in the returned result represents one comparison between the reference color from the Color Compare field, with the bit being set if the comparison is true. This mode is mainly used by flood fill algorithms that fill an area of a specific color, as it requires 1/4 the number of reads to determine the area that needs to be filled in addition to the additional work done by the comparison. Also an efficient "search and replace" operation that replaces one color with another can be performed when this mode is combined with Write Mode 3.

Maybe this is what we need?

Vutshi avatar May 02 '25 19:05 Vutshi

Maybe this is what we need?

considering they mention flood fill it seems promising, but i have no clue how it is actually used.

i implemented things like the color don't care register without really having a clue as to what they were for :D

dbalsom avatar May 02 '25 20:05 dbalsom

@ghaerr

something strange is going on with colors in MS Paint as can be seen in the video above and in the screenshot:

Image

It doesn't distinguish brown and dark red. Maybe our "optimized" VGA palette is confusing for 40 years old software?)

UPDATE: What I am doing here is opening export.bmp generated by ELKS Paint in MS Paint

EXPORT.BMP_from_ELKS_Paint.zip

Vutshi avatar May 02 '25 21:05 Vutshi

UPDATE: What I am doing here is opening export.bmp generated by ELKS Paint in MS Paint

Is this in MartyPC again?

i'd check the colors in 86box before assuming it isn't my fault. Although i'm not sure what would cause that.

dbalsom avatar May 02 '25 22:05 dbalsom

UPDATE: What I am doing here is opening export.bmp generated by ELKS Paint in MS Paint

Is this in MartyPC again?

i'd check the colors in 86box before assuming it isn't my fault. Although i'm not sure what would cause that.

Yes it is MartyPC. I haven’t check 86box, but ELKS Paint in the same MartyPC shows colors as expected

Vutshi avatar May 02 '25 22:05 Vutshi

Yes it is MartyPC. I haven’t check 86box, but ELKS Paint in the same MartyPC shows colors as expected

Can you get a shot with the palette open in the Video Card Viewer from both?

dbalsom avatar May 02 '25 22:05 dbalsom

Can you get a shot with the palette open in the Video Card Viewer from both?

Tomorrow

Vutshi avatar May 02 '25 22:05 Vutshi

@dbalsom

Here are the shots MS Paint

Image

ELKS Paint

Image

Vutshi avatar May 03 '25 08:05 Vutshi

It doesn't distinguish brown and dark red. Maybe our "optimized" VGA palette is confusing for 40 years old software?)

@Vutshi: remember that neither ELKS nor Paint actually rewrites any of the VGA palette registers - we're currently just assuming the indices of the "standard" CGA palette, which according to some links I sent previously, is the palette which the hardware and most emulators use.

We could write RGB colors ourselves to the palette if you think that's needed, but that might complicate the emulator environment depending on how the VGA palette hardware is emulated, and/or the initial defaults when not running Paint.

ghaerr avatar May 03 '25 08:05 ghaerr

Yeah, windows has just remapped the palette.

Before everything became 32-bit RGBA and people stopped having to deal with it, you had to deal with device dependent bitmaps. If you're targeting windows you'd need to try to match the windows palette.

dbalsom avatar May 03 '25 16:05 dbalsom