sf2d_texture_tile32(): Replace software implementation with GPU transfer
I've found that the software implementation of sf2d_texture_tile32() is extremely slow. It's much faster to delegate this task to hardware by performing a display transfer from the source into a buffer, then copying the buffer back to the source. Something roughly along these lines:
u32* buffer = linearAlloc(surface.tex->data_size);
GSPGPU_FlushDataCache(surface.tex->data, surface.tex->data_size);
u32 flags = (GX_TRANSFER_FLIP_VERT(1) | GX_TRANSFER_OUT_TILED(1) | GX_TRANSFER_RAW_COPY(0) | \
GX_TRANSFER_IN_FORMAT(GX_TRANSFER_FMT_RGBA8) | GX_TRANSFER_OUT_FORMAT(GX_TRANSFER_FMT_RGBA8) | \
GX_TRANSFER_SCALING(GX_TRANSFER_SCALE_NO));
GX_DisplayTransfer(
surface.tex->data,
GX_BUFFER_DIM(surface.tex->pow2_w, surface.tex->pow2_h),
buffer,
GX_BUFFER_DIM(surface.tex->pow2_w, surface.tex->pow2_h),
flags
);
gspWaitForPPF();
memcpy(surface.tex->data, buffer, surface.tex->data_size);
linearFree(buffer);
This speed boost is vital if you want to update the texture every frame (e.g. because you're doing software rendering).
Thanks to @WinterMute for getting me on the right track.
Yep, I have been wanting to this for a few time but never got the time to do and debug this (I have been very busy). I'll try it today if I have some free time, thanks!
It seems like I can't get it to work, I'm trying with something like this:
void sf2d_texture_transfer_from(sf2d_texture *texture, const void *data, int w, int h, GX_TRANSFER_FORMAT fmt)
{
const u32 flags = (GX_TRANSFER_FLIP_VERT(1) | GX_TRANSFER_OUT_TILED(1) | GX_TRANSFER_RAW_COPY(0) |
GX_TRANSFER_IN_FORMAT(fmt) | GX_TRANSFER_OUT_FORMAT(texture->pixel_format) |
GX_TRANSFER_SCALING(GX_TRANSFER_SCALE_NO));
GX_DisplayTransfer(
(void *)data,
GX_BUFFER_DIM(w, h),
texture->data,
GX_BUFFER_DIM(texture->pow2_w, texture->pow2_h),
flags
);
gspWaitForPPF();
// GSPGPU_FlushDataCache(texture->data, texture->data_size); not needed, I guess
}
I think you need to flush the data cache for data, not texture->data, and you need to do it before you use GX_DisplayTransfer(). I'm not sure about it, but the source dimensions probably also need to be powers of two, too - though that's probably more relevant to docs and testing. Give that a try.
You'll need to figure out the size of the input data then GSPGPU_FlushDataCache(data, size) before GX_DIsplayTransfer, GSPGPU_InvalidateDataCache(texture->data, texture->data_size) after gspWaitForPPF().
Any news about this?
I'm kinda busy atm, send a pull request and I'll happily merge it.