sf2dlib sf2d_texture_tile32(): Replace software implementation with GPU transfer

I've found that the software implementation of sf2d_texture_tile32() is extremely slow. It's much faster to delegate this task to hardware by performing a display transfer from the source into a buffer, then copying the buffer back to the source. Something roughly along these lines:

u32* buffer = linearAlloc(surface.tex->data_size);
GSPGPU_FlushDataCache(surface.tex->data, surface.tex->data_size);

u32 flags = (GX_TRANSFER_FLIP_VERT(1) | GX_TRANSFER_OUT_TILED(1) | GX_TRANSFER_RAW_COPY(0) | \
    GX_TRANSFER_IN_FORMAT(GX_TRANSFER_FMT_RGBA8) | GX_TRANSFER_OUT_FORMAT(GX_TRANSFER_FMT_RGBA8) | \
    GX_TRANSFER_SCALING(GX_TRANSFER_SCALE_NO));

GX_DisplayTransfer(
        surface.tex->data,
        GX_BUFFER_DIM(surface.tex->pow2_w, surface.tex->pow2_h),
        buffer,
        GX_BUFFER_DIM(surface.tex->pow2_w, surface.tex->pow2_h),
        flags
);
gspWaitForPPF();

memcpy(surface.tex->data, buffer, surface.tex->data_size);
linearFree(buffer);

This speed boost is vital if you want to update the texture every frame (e.g. because you're doing software rendering).

Thanks to @WinterMute for getting me on the right track.

Jan 07 '16 02:01 ghost

Yep, I have been wanting to this for a few time but never got the time to do and debug this (I have been very busy). I'll try it today if I have some free time, thanks!

Jan 07 '16 08:01 xerpi

It seems like I can't get it to work, I'm trying with something like this:

void sf2d_texture_transfer_from(sf2d_texture *texture, const void *data, int w, int h, GX_TRANSFER_FORMAT fmt)
{
    const u32 flags = (GX_TRANSFER_FLIP_VERT(1) | GX_TRANSFER_OUT_TILED(1) | GX_TRANSFER_RAW_COPY(0) |
        GX_TRANSFER_IN_FORMAT(fmt) | GX_TRANSFER_OUT_FORMAT(texture->pixel_format) |
        GX_TRANSFER_SCALING(GX_TRANSFER_SCALE_NO));

    GX_DisplayTransfer(
        (void *)data,
        GX_BUFFER_DIM(w, h),
        texture->data,
        GX_BUFFER_DIM(texture->pow2_w, texture->pow2_h),
        flags
    );
    gspWaitForPPF();
    // GSPGPU_FlushDataCache(texture->data, texture->data_size); not needed, I guess
}

Jan 08 '16 11:01 xerpi

I think you need to flush the data cache for data, not texture->data, and you need to do it before you use GX_DisplayTransfer(). I'm not sure about it, but the source dimensions probably also need to be powers of two, too - though that's probably more relevant to docs and testing. Give that a try.

Jan 08 '16 15:01 ghost

You'll need to figure out the size of the input data then GSPGPU_FlushDataCache(data, size) before GX_DIsplayTransfer, GSPGPU_InvalidateDataCache(texture->data, texture->data_size) after gspWaitForPPF().

Jan 09 '16 15:01 WinterMute

Any news about this?

Mar 20 '16 22:03 Rinnegatamante

I'm kinda busy atm, send a pull request and I'll happily merge it.

Mar 20 '16 22:03 xerpi