KTX-Software icon indicating copy to clipboard operation
KTX-Software copied to clipboard

The execution speed of the ktxTexture2_CompressBasisEx method is quite slow

Open newpeople123 opened this issue 1 year ago • 3 comments

The execution speed of the ktxTexture2_CompressBasisEx method is quite slow. I set the number of threads to 16, generated mipmaps from a 1.88MB jpg image, converted it to ktx2, and then compressed it. The ktxTexture2_CompressBasisEx took 59 seconds to complete. How can I speed up the execution of the ktxTexture2_CompressBasisEx method?

newpeople123 avatar May 16 '24 08:05 newpeople123

What is the pixel size of the image? What device are you running on. Can it actually support 16 threads? Are you running a Release configuration build?

I've only experienced such run times when compressing a mipmapped cubemap which uncompressed is 72Mbytes.

MarkCallow avatar May 18 '24 02:05 MarkCallow

The resolution of the image is 2048 * 2048,my computer's CPU is 11th Gen Intel(R) Core(TM) i9-11900KF @ 3.5GHz. my code: ktxBasisParams params = { 0 }; params.structSize = sizeof(params); params.compressionLevel = KTX_ETC1S_DEFAULT_COMPRESSION_LEVEL; params.uastc = KTX_FALSE; unsigned int numThreads = 16; if (numThreads == 0) { numThreads = 2; } params.threadCount = numThreads; result = ktxTexture2_CompressBasisEx((ktxTexture2*)texture, &params); Among them, the texture variable‘s resolution ranges from 2048 * 2048 to 4 * 4.

newpeople123 avatar May 20 '24 01:05 newpeople123

10012

newpeople123 avatar May 28 '24 08:05 newpeople123

I'm not sure what is going on. I just used ktx create to encode the file on my M2 MacBook Pro. Here is the output from time

mark:~ $ time ktx create --format R8G8B8_SRGB --encode basis-lz ~/Downloads/issue910.jpg issue910.ktx2

real	0m0.918s
user	0m2.229s
sys	0m0.036s

This command would have used the same API and default compressionLevel as you and the value of std::thread::hardware_concurrency() as the number of threads. I tried --threads 16 as well. It didn't make a difference.

Are you sure you are running a Release configuration? Are you sure your device supports 16 threads? Did you add mip levels to the ktxTexture2 object before encoding?

Among them, the texture variable‘s resolution ranges from 2048 * 2048 to 4 * 4.

I do not understand what you are saying here. What does "them" refer to? I think some sentences are missing.

MarkCallow avatar May 28 '24 10:05 MarkCallow

Did you add mip levels to the ktxTexture2 object before encoding?

Ahh! Sorry. I see you mentioned adding mipmaps. Here is the timing on my MBP with mipmap generation added to the command:

mark:~ $ time ktx create --format R8G8B8_SRGB --encode basis-lz --generate-mipmap ~/Downloads/issue910.jpg issue910.ktx2

real	0m1.466s
user	0m3.381s
sys	0m0.048s

MarkCallow avatar May 28 '24 10:05 MarkCallow

Did you add mip levels to the ktxTexture2 object before encoding?

Ahh! Sorry. I see you mentioned adding mipmaps. Here is the timing on my MBP with mipmap generation added to the command:


mark:~ $ time ktx create --format R8G8B8_SRGB --encode basis-lz --generate-mipmap ~/Downloads/issue910.jpg issue910.ktx2



real	0m1.466s

user	0m3.381s

sys	0m0.048s

Thank you very much for your patient response! My computer supports 16 threads, and I use the toktx tool provided by ktx to generate ktx2 files with mipmaps and super compression very quickly, taking about three seconds, similar to your experience. Additionally, I pulled the basis_universal repository, which also offers similar functionality to the toktx tool. However, its run time is also very long, similar to my own code, and they both have the same issue: my CPU usage is not 100% (sometimes it starts at 100%, but then the peak is at most 50%). I would like to know, when you run it, does your CPU usage reach 100%? (It should be noted that my CPU supports up to 16 threads, so I believe that when I set the maximum number of threads to 16, the CPU usage should be 100%). Looking forward to your reply!

newpeople123 avatar May 28 '24 13:05 newpeople123

Did you add mip levels to the ktxTexture2 object before encoding?

Ahh! Sorry. I see you mentioned adding mipmaps. Here is the timing on my MBP with mipmap generation added to the command:


mark:~ $ time ktx create --format R8G8B8_SRGB --encode basis-lz --generate-mipmap ~/Downloads/issue910.jpg issue910.ktx2



real	0m1.466s

user	0m3.381s

sys	0m0.048s

Alternatively, I wonder if you could test it again using the sample code and images I provided, and observe the run time (by writing code to test it instead of using the tools provided by the ktx repository).

newpeople123 avatar May 28 '24 13:05 newpeople123

Both ktx create and toktx use libktx and the ktxTexture2_CompressBasisEx function to do the compression and ktxTexture2_CompressBasisEx uses the same code as the basisu tool in the basis_universal repository. so I think you need to look at what your own code is doing.

Are you sure you compiled optimized code? Code compiled in debug mode runs much slower. This is why I asked if you were using Release configuration.

MarkCallow avatar May 28 '24 13:05 MarkCallow

Alternatively, I wonder if you could test it again using the sample code and images I provided

Where? I only see a single image.

MarkCallow avatar May 28 '24 13:05 MarkCallow

两者都使用 libktx 和函数进行压缩,并使用与 basis_universal 存储库中的工具相同的代码。所以我认为你需要看看你自己的代码在做什么。ktx create``toktx``ktxTexture2_CompressBasisEx``ktxTexture2_CompressBasisEx``basisu

您确定编译了优化的代码吗?在调试模式下编译的代码运行速度要慢得多。这就是为什么我问你是否正在使用发布配置。

I will provide you with all my code. I used OSG (OpenSceneGraph) for image processing, and the code is as follows: `ktxTexture* saveImageToKtx(const osg::Image* image, bool compressed) { ktxTexture* texture = nullptr;

ktxTextureCreateInfo createInfo;

int componentCount = image->computeNumComponents(image->getPixelFormat());
int componentSize = glGetTypeSizeFromType(image->getDataType());
GLuint max_dim = image->s() > image->t() ?
	image->s() : image->t();
//The minimum resolution control is 4x * or * x4
max_dim = floor(log2(max_dim)) - 1;
createInfo.glInternalformat = image->getInternalTextureFormat();

createInfo.vkFormat = glGetVkFormatFromInternalFormat(createInfo.glInternalformat);
createInfo.baseWidth = image->s();
createInfo.baseHeight = image->t();
createInfo.baseDepth = image->r();
createInfo.numDimensions = (image->r() > 1) ? 3 : ((image->t() > 1) ? 2 : 1);
createInfo.numLevels = max_dim;
createInfo.numLayers = 1;
createInfo.numFaces = 1;
createInfo.isArray = KTX_FALSE;
createInfo.generateMipmaps = KTX_FALSE;
if (createInfo.vkFormat == 0)
{
	OSG_WARN << "[LoaderKTX] No VkFormat for GL internal format: "
		<< std::hex << createInfo.glInternalformat << std::dec << std::endl;
	return nullptr;
}

KTX_error_code result = ktxTexture2_Create(
	&createInfo, KTX_TEXTURE_CREATE_ALLOC_STORAGE, (ktxTexture2**)&texture);
if (result != KTX_SUCCESS)
{
	OSG_WARN << "[LoaderKTX] Unable to create KTX for saving" << std::endl;
	return nullptr;
}

osg::Image* imgCopy = dynamic_cast<osg::Image*>(image->clone(osg::CopyOp::DEEP_COPY_ALL));
//Generate different levels of mipmap through image scaling
for (size_t i = 0; i < createInfo.numLevels; ++i)
{
	int width = image->s() / pow(2, i);
	int height = image->t() / pow(2, i);
	imgCopy->scaleImage(width, height, imgCopy->r());
	const ktx_uint8_t* src = (ktx_uint8_t*)imgCopy->data();
	const unsigned int level = i;
	result = ktxTexture_SetImageFromMemory(
		ktxTexture(texture), level, 0, 0,
		src, imgCopy->getTotalSizeInBytes());
	if (result != KTX_SUCCESS)
	{
		OSG_WARN << "[LoaderKTX] Unable to save image " << i
			<< " to KTX texture: " << ktxErrorString(result) << std::endl;
		ktxTexture_Destroy(texture); return nullptr;
	}
}

ktx_uint32_t w = texture->baseWidth, h = texture->baseHeight;
if (compressed) {
	if (((w > 0) && (w & (w - 1)) == 0) && ((h > 0) && (h & (h - 1)) == 0)) {
		ktxBasisParams params = { 0 };
		params.structSize = sizeof(params);
		params.compressionLevel = KTX_ETC1S_DEFAULT_COMPRESSION_LEVEL;
		unsigned int numThreads = std::thread::hardware_concurrency();
		if (numThreads == 0) {
			numThreads = 2;
		}
		params.threadCount = numThreads;
		result = ktxTexture2_CompressBasisEx((ktxTexture2*)texture, &params);
		if (result != KTX_SUCCESS)
		{
			OSG_WARN << "[LoaderKTX] Failed to compress ktxTexture2: "
				<< ktxErrorString(result) << std::endl;
		}
	}
	else {
		OSG_WARN << "[LoaderKTX] Failed to compress ktxTexture2: "
			<< "image's width or height is not to the nth power of 2" << std::endl;
	}
}
return texture;

}`

newpeople123 avatar May 28 '24 13:05 newpeople123

或者,我想知道您是否可以使用我提供的示例代码和图像再次测试它

哪里?我只看到一张图片。

There is only one image. I used an image to generate a ktx2 file with mipmaps. By scaling this image, I produced different levels of mipmaps. The size of the image itself is 2048 * 2048, double it to generate a 1024 * 1024 size image, and then double it to produce a 512 * 512 size image... until the size of the image is 4 * 4.

newpeople123 avatar May 28 '24 13:05 newpeople123

或者,我想知道您是否可以使用我提供的示例代码和图像再次测试它

哪里?我只看到一张图片。

There is only one image. I used an image to generate a ktx2 file with mipmaps. By scaling this image, I produced different levels of mipmaps. The size of the image itself is 2048 * 2048, double it to generate a 1024 * 1024 size image, and then double it to produce a 512 * 512 size image... until the size of the image is 4 * 4.

I forgot to add that even without generating mipmaps, simply converting this image into a ktx2 file with super compression takes a long time, so you can still test without generating mipmaps.

newpeople123 avatar May 28 '24 13:05 newpeople123

You have still not answered my question about debug vs. optimized code. I need the answer.

MarkCallow avatar May 29 '24 00:05 MarkCallow

You have still not answered my question about debug vs. optimized code. I need the answer.

Sorry, I didn't understand your meaning before. I tried the way you said and the program running time significantly decreased to about 3 seconds, It is shocking that there is so much difference between the Debug and Release modes. Thank you very much for your help, good luck!

newpeople123 avatar May 29 '24 02:05 newpeople123

I'm glad the problem was just the lack of optimization.

MarkCallow avatar May 29 '24 07:05 MarkCallow