The execution speed of the ktxTexture2_CompressBasisEx method is quite slow
The execution speed of the ktxTexture2_CompressBasisEx method is quite slow. I set the number of threads to 16, generated mipmaps from a 1.88MB jpg image, converted it to ktx2, and then compressed it. The ktxTexture2_CompressBasisEx took 59 seconds to complete. How can I speed up the execution of the ktxTexture2_CompressBasisEx method?
What is the pixel size of the image? What device are you running on. Can it actually support 16 threads? Are you running a Release configuration build?
I've only experienced such run times when compressing a mipmapped cubemap which uncompressed is 72Mbytes.
The resolution of the image is 2048 * 2048,my computer's CPU is 11th Gen Intel(R) Core(TM) i9-11900KF @ 3.5GHz.
my code:
ktxBasisParams params = { 0 }; params.structSize = sizeof(params); params.compressionLevel = KTX_ETC1S_DEFAULT_COMPRESSION_LEVEL; params.uastc = KTX_FALSE; unsigned int numThreads = 16; if (numThreads == 0) { numThreads = 2; } params.threadCount = numThreads; result = ktxTexture2_CompressBasisEx((ktxTexture2*)texture, ¶ms);
Among them, the texture variable‘s resolution ranges from 2048 * 2048 to 4 * 4.
I'm not sure what is going on. I just used ktx create to encode the file on my M2 MacBook Pro. Here is the output from time
mark:~ $ time ktx create --format R8G8B8_SRGB --encode basis-lz ~/Downloads/issue910.jpg issue910.ktx2
real 0m0.918s
user 0m2.229s
sys 0m0.036s
This command would have used the same API and default compressionLevel as you and the value of std::thread::hardware_concurrency() as the number of threads. I tried --threads 16 as well. It didn't make a difference.
Are you sure you are running a Release configuration? Are you sure your device supports 16 threads? Did you add mip levels to the ktxTexture2 object before encoding?
Among them, the texture variable‘s resolution ranges from 2048 * 2048 to 4 * 4.
I do not understand what you are saying here. What does "them" refer to? I think some sentences are missing.
Did you add mip levels to the ktxTexture2 object before encoding?
Ahh! Sorry. I see you mentioned adding mipmaps. Here is the timing on my MBP with mipmap generation added to the command:
mark:~ $ time ktx create --format R8G8B8_SRGB --encode basis-lz --generate-mipmap ~/Downloads/issue910.jpg issue910.ktx2
real 0m1.466s
user 0m3.381s
sys 0m0.048s
Did you add mip levels to the ktxTexture2 object before encoding?
Ahh! Sorry. I see you mentioned adding mipmaps. Here is the timing on my MBP with mipmap generation added to the command:
mark:~ $ time ktx create --format R8G8B8_SRGB --encode basis-lz --generate-mipmap ~/Downloads/issue910.jpg issue910.ktx2 real 0m1.466s user 0m3.381s sys 0m0.048s
Thank you very much for your patient response! My computer supports 16 threads, and I use the toktx tool provided by ktx to generate ktx2 files with mipmaps and super compression very quickly, taking about three seconds, similar to your experience. Additionally, I pulled the basis_universal repository, which also offers similar functionality to the toktx tool. However, its run time is also very long, similar to my own code, and they both have the same issue: my CPU usage is not 100% (sometimes it starts at 100%, but then the peak is at most 50%). I would like to know, when you run it, does your CPU usage reach 100%? (It should be noted that my CPU supports up to 16 threads, so I believe that when I set the maximum number of threads to 16, the CPU usage should be 100%). Looking forward to your reply!
Did you add mip levels to the ktxTexture2 object before encoding?
Ahh! Sorry. I see you mentioned adding mipmaps. Here is the timing on my MBP with mipmap generation added to the command:
mark:~ $ time ktx create --format R8G8B8_SRGB --encode basis-lz --generate-mipmap ~/Downloads/issue910.jpg issue910.ktx2 real 0m1.466s user 0m3.381s sys 0m0.048s
Alternatively, I wonder if you could test it again using the sample code and images I provided, and observe the run time (by writing code to test it instead of using the tools provided by the ktx repository).
Both ktx create and toktx use libktx and the ktxTexture2_CompressBasisEx function to do the compression and ktxTexture2_CompressBasisEx uses the same code as the basisu tool in the basis_universal repository. so I think you need to look at what your own code is doing.
Are you sure you compiled optimized code? Code compiled in debug mode runs much slower. This is why I asked if you were using Release configuration.
Alternatively, I wonder if you could test it again using the sample code and images I provided
Where? I only see a single image.
两者都使用 libktx 和函数进行压缩,并使用与 basis_universal 存储库中的工具相同的代码。所以我认为你需要看看你自己的代码在做什么。
ktx create``toktx``ktxTexture2_CompressBasisEx``ktxTexture2_CompressBasisEx``basisu您确定编译了优化的代码吗?在调试模式下编译的代码运行速度要慢得多。这就是为什么我问你是否正在使用发布配置。
I will provide you with all my code. I used OSG (OpenSceneGraph) for image processing, and the code is as follows: `ktxTexture* saveImageToKtx(const osg::Image* image, bool compressed) { ktxTexture* texture = nullptr;
ktxTextureCreateInfo createInfo;
int componentCount = image->computeNumComponents(image->getPixelFormat());
int componentSize = glGetTypeSizeFromType(image->getDataType());
GLuint max_dim = image->s() > image->t() ?
image->s() : image->t();
//The minimum resolution control is 4x * or * x4
max_dim = floor(log2(max_dim)) - 1;
createInfo.glInternalformat = image->getInternalTextureFormat();
createInfo.vkFormat = glGetVkFormatFromInternalFormat(createInfo.glInternalformat);
createInfo.baseWidth = image->s();
createInfo.baseHeight = image->t();
createInfo.baseDepth = image->r();
createInfo.numDimensions = (image->r() > 1) ? 3 : ((image->t() > 1) ? 2 : 1);
createInfo.numLevels = max_dim;
createInfo.numLayers = 1;
createInfo.numFaces = 1;
createInfo.isArray = KTX_FALSE;
createInfo.generateMipmaps = KTX_FALSE;
if (createInfo.vkFormat == 0)
{
OSG_WARN << "[LoaderKTX] No VkFormat for GL internal format: "
<< std::hex << createInfo.glInternalformat << std::dec << std::endl;
return nullptr;
}
KTX_error_code result = ktxTexture2_Create(
&createInfo, KTX_TEXTURE_CREATE_ALLOC_STORAGE, (ktxTexture2**)&texture);
if (result != KTX_SUCCESS)
{
OSG_WARN << "[LoaderKTX] Unable to create KTX for saving" << std::endl;
return nullptr;
}
osg::Image* imgCopy = dynamic_cast<osg::Image*>(image->clone(osg::CopyOp::DEEP_COPY_ALL));
//Generate different levels of mipmap through image scaling
for (size_t i = 0; i < createInfo.numLevels; ++i)
{
int width = image->s() / pow(2, i);
int height = image->t() / pow(2, i);
imgCopy->scaleImage(width, height, imgCopy->r());
const ktx_uint8_t* src = (ktx_uint8_t*)imgCopy->data();
const unsigned int level = i;
result = ktxTexture_SetImageFromMemory(
ktxTexture(texture), level, 0, 0,
src, imgCopy->getTotalSizeInBytes());
if (result != KTX_SUCCESS)
{
OSG_WARN << "[LoaderKTX] Unable to save image " << i
<< " to KTX texture: " << ktxErrorString(result) << std::endl;
ktxTexture_Destroy(texture); return nullptr;
}
}
ktx_uint32_t w = texture->baseWidth, h = texture->baseHeight;
if (compressed) {
if (((w > 0) && (w & (w - 1)) == 0) && ((h > 0) && (h & (h - 1)) == 0)) {
ktxBasisParams params = { 0 };
params.structSize = sizeof(params);
params.compressionLevel = KTX_ETC1S_DEFAULT_COMPRESSION_LEVEL;
unsigned int numThreads = std::thread::hardware_concurrency();
if (numThreads == 0) {
numThreads = 2;
}
params.threadCount = numThreads;
result = ktxTexture2_CompressBasisEx((ktxTexture2*)texture, ¶ms);
if (result != KTX_SUCCESS)
{
OSG_WARN << "[LoaderKTX] Failed to compress ktxTexture2: "
<< ktxErrorString(result) << std::endl;
}
}
else {
OSG_WARN << "[LoaderKTX] Failed to compress ktxTexture2: "
<< "image's width or height is not to the nth power of 2" << std::endl;
}
}
return texture;
}`
或者,我想知道您是否可以使用我提供的示例代码和图像再次测试它
哪里?我只看到一张图片。
There is only one image. I used an image to generate a ktx2 file with mipmaps. By scaling this image, I produced different levels of mipmaps. The size of the image itself is 2048 * 2048, double it to generate a 1024 * 1024 size image, and then double it to produce a 512 * 512 size image... until the size of the image is 4 * 4.
或者,我想知道您是否可以使用我提供的示例代码和图像再次测试它
哪里?我只看到一张图片。
There is only one image. I used an image to generate a ktx2 file with mipmaps. By scaling this image, I produced different levels of mipmaps. The size of the image itself is 2048 * 2048, double it to generate a 1024 * 1024 size image, and then double it to produce a 512 * 512 size image... until the size of the image is 4 * 4.
I forgot to add that even without generating mipmaps, simply converting this image into a ktx2 file with super compression takes a long time, so you can still test without generating mipmaps.
You have still not answered my question about debug vs. optimized code. I need the answer.
You have still not answered my question about debug vs. optimized code. I need the answer.
Sorry, I didn't understand your meaning before. I tried the way you said and the program running time significantly decreased to about 3 seconds, It is shocking that there is so much difference between the Debug and Release modes. Thank you very much for your help, good luck!
I'm glad the problem was just the lack of optimization.