Concurrency issues when call ktxTexture2_TranscodeBasis in multithreading.
Hi,
The ktxTexture2_TranscodeBasis function calls the basisu_transcoder_init function, which is not thread-safe. When ktxTexture2_TranscodeBasis is called simultaneously from multiple threads, the basisu_transcoder_init function may run concurrently in multiple threads, corrupting the initialization process of global data structures within basisu and causing the process to crash. The abnormal code snippet is as follows:
// Transcoder global initialization. Requires ~9 milliseconds when compiled
// and executed natively on a Core i7 2.2 GHz. If this is too slow, the
// tables it computes can easily be moved to be compiled in.
static bool transcoderInitialized;
if (!transcoderInitialized) {
basisu_transcoder_init();
transcoderInitialized = true;
}
Is there possible to design a thread-safe initialization mechanism to ensure basisu_transcoder_init called only once in multi-threaded environment.
Thank you for your report. I will investigate. Please provide a simple application or a gtest test case to demonstrate the problem. I want to add a test case to our gtest test suite, to the texturetests application I think.
The problem has a low probability of occurrence and is difficult to reproduce. I am trying to find ways to increase the probability of reproducing this problem, and I will update you on any progress I make here.
I can reproduce the issue using follow code although the probability is a bit small. I test it with a android phone. The key is make ktxTexture2_TranscodeBasis to be called concurrently at the same time in multiple threads. For this I add a barrier in the task :
void ktxSoftwareTest() {
std::barrier syncPoint(2);
auto funcLoad = [&syncPoint] (const std::string& filePath) {
ktxTexture2 *texture = nullptr;
KTX_error_code result = ktxTexture_CreateFromNamedFile(
filePath.c_str(), KTX_TEXTURE_CREATE_LOAD_IMAGE_DATA_BIT, (ktxTexture **) &texture);
if (result != KTX_SUCCESS) {
ALOGE("Could not load the requested image file(%d)!", result);
return;
}
if (ktxTexture2_NeedsTranscoding(texture)) {
auto targetFormat = KTX_TTF_ASTC_4x4_RGBA;
// The barrier to make ktxTexture2_TranscodeBasis to be called concurrently in multiple threads
syncPoint.arrive_and_wait();
ALOGD("transcoding(%s)...", ktxTranscodeFormatString(targetFormat));
result = ktxTexture2_TranscodeBasis(texture, targetFormat, 0);
if (result != KTX_SUCCESS) {
ALOGE("transcode input texture to the target format(%s) failed(%d): %s",
ktxTranscodeFormatString(targetFormat), result, ktxErrorString(result));
} else {
ALOGD("texture transcoding finished");
}
} else {
ALOGD("no need to do texture transcoding.");
}
ktxTexture2_Destroy(texture);
};
std::thread t1([&funcLoad] {
funcLoad("the/path/of/one.ktx2");
});
std::thread t2([&funcLoad] {
funcLoad("the/path/of/another.ktx2");
});
t1.join();
t2.join();
}
Thank you. I have converted the code to a function in our gtest suite. Unfortunately I have not been able to reproduce the problem. Does the size (i.e. time to transcode) make any difference to the probability? Would more threads increase the probability? I want to reproduce it before attempting to add a mutex around the transcoder initialization. That way I can tell if the problem is fixed.
As a workaround until a mutex is added, you can call ktxTexture2_TranscodeBasis and wait for it to return before launching your threads. basisu_transcoder_init() only needs to be called once. A global variable is used so make that happen which works fine in unthreaded code and once set, will prevent further calls by the threads.
As a workaround until a mutex is added, you can call
ktxTexture2_TranscodeBasisand wait for it to return
Shortly after I sent the above, it dawned on me that this very solution was likely preventing me from reproducing the problem. I had added the new test at the end of the test suite and a previous test was calling ktxTexture2_TranscodeBasis. I moved the new test to run first but I still haven't reproduced it after 20 attempts. I set breakpoints at the basisu_transcoder_init(); and transcoderInitialized = true; lines and can reliably see the second thread calling basisu_transcoder_init() before the first thread returns and sets transcoderInitialized. Nevertheless the test is successfully transcoding the input image. @vmwalker, do you have any idea why it is alway succeeding for me?
I was able to reproduce the issue once every hundred local tests. My conditions for reproduction were that the ktx file was approximately 3MB. The key to reproducing the problem was that multiple threads simultaneously started calling ktxTexture2_TranscodeBasis. To achieve this, I added syncPoint.arrive_and_wait() to my test code. Theoretically, increasing the number of threads should also slightly increase the probability of reproducing the problem. Additionally, using a less powerful device should also increase the probability of reproducing the problem (increasing the chance of concurrently reallocating memory in basisu_transcoder_init; on my Android phone, each call to ktxTexture2_TranscodeBasis took approximately 40ms). The exception call stack of the problem I reproduced using the debug version is as follows:
11-24 20:10:11.621 9968 9968 F DEBUG : Time: 2025-11-24 20:10:11
11-24 20:10:11.621 9968 9968 F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
11-24 20:10:11.621 9968 9968 F DEBUG : Build fingerprint: ****
11-24 20:10:11.621 9968 9968 F DEBUG : Revision: '0'
11-24 20:10:11.621 9968 9968 F DEBUG : ABI: 'arm64'
11-24 20:10:11.621 9968 9968 F DEBUG : Timestamp: 2025-11-24 20:10:11.402682266+0800
11-24 20:10:11.621 9968 9968 F DEBUG : Process uptime: 4s
11-24 20:10:11.621 9968 9968 F DEBUG : Cmdline: com.****.****
11-24 20:10:11.621 9968 9968 F DEBUG : pid: 9905, tid: 9965, name: llvm-worker-1 >>> com.****.**** <<<
11-24 20:10:11.621 9968 9968 F DEBUG : uid: 1000
11-24 20:10:11.621 9968 9968 F DEBUG : tagged_addr_ctrl: 0000000000000001 (PR_TAGGED_ADDR_ENABLE)
11-24 20:10:11.621 9968 9968 F DEBUG : pac_enabled_keys: 000000000000000f (PR_PAC_APIAKEY, PR_PAC_APIBKEY, PR_PAC_APDAKEY, PR_PAC_APDBKEY)
11-24 20:10:11.621 9968 9968 F DEBUG : signal 6 (SIGABRT), code -1 (SI_QUEUE), fault addr --------
11-24 20:10:11.621 9968 9968 F DEBUG : Abort message: '****/ktx-software/external/basisu/transcoder/basisu_containers_impl.h:42: bool basisu::elemental_vector::increase_capacity(size_t, bool, size_t, object_mover, bool): assertion "m_size <= m_capacity" failed'
11-24 20:10:11.621 9968 9968 F DEBUG : x0 0000000000000000 x1 00000000000026ed x2 0000000000000006 x3 00000070029819c0
11-24 20:10:11.621 9968 9968 F DEBUG : x4 6f7164666d68452e x5 6f7164666d68452e x6 6f7164666d68452e x7 7f7f7f7f7f7f7f7f
11-24 20:10:11.621 9968 9968 F DEBUG : x8 00000000000000f0 x9 001ece7c670dd341 x10 000000ff00000020 x11 000000003b9ac9ff
11-24 20:10:11.621 9968 9968 F DEBUG : x12 000000007fffffff x13 00000000001d34ca x14 00000013e3418dd2 x15 0000000034155555
11-24 20:10:11.621 9968 9968 F DEBUG : x16 0000007368dbfe68 x17 0000007368da4f40 x18 0000007000e98000 x19 00000000000026b1
11-24 20:10:11.621 9968 9968 F DEBUG : x20 00000000000026ed x21 00000000ffffffff x22 00000000000026b1 x23 00000000000026cb
11-24 20:10:11.621 9968 9968 F DEBUG : x24 00000070029823c0 x25 0000007002982728 x26 0000000000000058 x27 000000700288a000
11-24 20:10:11.621 9968 9968 F DEBUG : x28 0000007002886000 x29 0000007002981a40
11-24 20:10:11.621 9968 9968 F DEBUG : lr 0000007368d3fcac sp 00000070029819c0 pc 0000007368d3fcd0 pst 0000000000001000
11-24 20:10:11.621 9968 9968 F DEBUG : 36 total frames
11-24 20:10:11.621 9968 9968 F DEBUG : backtrace:
11-24 20:10:11.621 9968 9968 F DEBUG : #00 pc 00000000000bacd0 /apex/com.android.runtime/lib64/bionic/libc.so (abort+160) (BuildId: eecc0eb39ce8d4c720af5e5e592df49c)
11-24 20:10:11.621 9968 9968 F DEBUG : #01 pc 00000000000bb554 /apex/com.android.runtime/lib64/bionic/libc.so (__assert2+40) (BuildId: eecc0eb39ce8d4c720af5e5e592df49c)
11-24 20:10:11.621 9968 9968 F DEBUG : #02 pc 000000000043cb6c /system/app/****/lib/arm64/lib****.so (basisu::elemental_vector::increase_capacity(unsigned long, bool, unsigned long, void (*)(void*, void*, unsigned long), bool)+96) (BuildId: c4bef52fd665d3d03533743d6b06e76d72c67e56)
11-24 20:10:11.621 9968 9968 F DEBUG : #03 pc 0000000000483ce0 /system/app/****/lib/arm64/lib****.so (basisu::vector<unsigned char>::increase_capacity(unsigned long, bool, bool)+60) (BuildId: c4bef52fd665d3d03533743d6b06e76d72c67e56)
11-24 20:10:11.621 9968 9968 F DEBUG : #04 pc 0000000000483c24 /system/app/****/lib/arm64/lib****.so (basisu::vector<unsigned char>::try_resize(unsigned long, bool)+184) (BuildId: c4bef52fd665d3d03533743d6b06e76d72c67e56)
11-24 20:10:11.621 9968 9968 F DEBUG : #05 pc 0000000000483b44 /system/app/****/lib/arm64/lib****.so (basisu::vector<unsigned char>::resize(unsigned long, bool)+40) (BuildId: c4bef52fd665d3d03533743d6b06e76d72c67e56)
11-24 20:10:11.621 9968 9968 F DEBUG : #06 pc 0000000000483aa8 /system/app/****/lib/arm64/lib****.so (astc_helpers::dequant_table::init(bool, unsigned int, bool)+108) (BuildId: c4bef52fd665d3d03533743d6b06e76d72c67e56)
11-24 20:10:11.621 9968 9968 F DEBUG : #07 pc 000000000047ea48 /system/app/****/lib/arm64/lib****.so (astc_helpers::dequant_tables::init(bool)+372) (BuildId: c4bef52fd665d3d03533743d6b06e76d72c67e56)
11-24 20:10:11.621 9968 9968 F DEBUG : #08 pc 00000000004413c0 /system/app/****/lib/arm64/lib****.so (astc_helpers::init_tables(bool)+32) (BuildId: c4bef52fd665d3d03533743d6b06e76d72c67e56)
11-24 20:10:11.621 9968 9968 F DEBUG : #09 pc 0000000000447ec4 /system/app/****/lib/arm64/lib****.so (basist::basisu_transcoder_init()+44) (BuildId: c4bef52fd665d3d03533743d6b06e76d72c67e56)
11-24 20:10:11.621 9968 9968 F DEBUG : #10 pc 00000000004399a8 /system/app/****/lib/arm64/lib****.so (ktxTexture2_TranscodeBasis+1656) (BuildId: c4bef52fd665d3d03533743d6b06e76d72c67e56)
11-24 20:10:11.621 9968 9968 F DEBUG : #11 pc 0000000000357a78 /system/app/****/lib/arm64/lib****.so (ktxSoftwareTest()::$_2::operator()(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char>> const&) const+380) (BuildId: c4bef52fd665d3d03533743d6b06e76d72c67e56)
11-24 20:10:11.621 9968 9968 F DEBUG : #12 pc 0000000000358e84 /system/app/****/lib/arm64/lib****.so (ktxSoftwareTest()::$_1::operator()() const+56) (BuildId: c4bef52fd665d3d03533743d6b06e76d72c67e56)
11-24 20:10:11.621 9968 9968 F DEBUG : #13 pc 0000000000358e3c /system/app/****/lib/arm64/lib****.so (decltype(std::declval<ktxSoftwareTest()::$_1&>()()) std::__ndk1::__invoke[abi:ne190000]<ktxSoftwareTest()::$_1&>(ktxSoftwareTest()::$_1&)+20) (BuildId: c4bef52fd665d3d03533743d6b06e76d72c67e56)
11-24 20:10:11.621 9968 9968 F DEBUG : #14 pc 0000000000358df4 /system/app/****/lib/arm64/lib****.so (void std::__ndk1::__invoke_void_return_wrapper<void, true>::__call[abi:ne190000]<ktxSoftwareTest()::$_1&>(ktxSoftwareTest()::$_1&)+20) (BuildId: c4bef52fd665d3d03533743d6b06e76d72c67e56)
11-24 20:10:11.621 9968 9968 F DEBUG : #15 pc 0000000000358dd0 /system/app/****/lib/arm64/lib****.so (std::__ndk1::__function::__alloc_func<ktxSoftwareTest()::$_1, std::__ndk1::allocator<ktxSoftwareTest()::$_1>, void ()>::operator()[abi:ne190000]()+24) (BuildId: c4bef52fd665d3d03533743d6b06e76d72c67e56)
11-24 20:10:11.621 9968 9968 F DEBUG : #16 pc 00000000003580b8 /system/app/****/lib/arm64/lib****.so (std::__ndk1::__function::__func<ktxSoftwareTest()::$_1, std::__ndk1::allocator<ktxSoftwareTest()::$_1>, void ()>::operator()()+24) (BuildId: c4bef52fd665d3d03533743d6b06e76d72c67e56)
11-24 20:10:11.621 9968 9968 F DEBUG : #17 pc 000000000023e690 /system/app/****/lib/arm64/lib****.so (std::__ndk1::__function::__value_func<void ()>::operator()[abi:ne190000]() const+56) (BuildId: c4bef52fd665d3d03533743d6b06e76d72c67e56)
@vmwalker, thanks for the additional info. Given the structure of the gtest-based unit tests to which I have added your test and the fact variables within the test library are not accessible because it is a shared library, I can only try multiple runs manually. I will think about how to package it differently with a script to invoke it multiple times. As an alternative I did try creating 100 threads and ran that a few times without provoking the failure.
I think the mutex is best placed in the basis_universal library so I have opened an issue over there.
@MarkCallow thank you very much for taking the time to discuss this issue. I'll try adding a mutex when calling ktxTexture2_TranscodeBasis to work around the problem, although this slightly reduces the efficiency of parsing ktx files (because locking is only required the first time the method is called; subsequent calls don't need this mutex, and also this is the most time-consuming part of the entire ktx file parsing process).
I'm still working on this. I have created a branch with the fix but it is not yet merged so I have reopened it.
@vmwalker please test the fix as I am still unable to reproduce the issue. You can find the Android package built from the fix branch at https://github.com/KhronosGroup/KTX-Software/actions/runs/19962184363/artifacts/4776300719.
Most of the changes in the branch have to do with my upping the c++ version used in the library tests to c++20 so I can use std::barrier. The fix is the change in lib/src/basis_transcode.cpp.
I made a standalone test that creates five threads and a script to call it from ctest. I removed the putative fix from the library I had the ctest call the test 1000 times. I did not reproduce the failure. I will have to rely on you @vmwalker testing the fix.
I made a standalone test that creates five threads and a script to call it from ctest. I removed the putative fix from the library I had the ctest call the test 1000 times. I did not reproduce the failure. I will have to rely on you @vmwalker testing the fix.
Ahh! I had an error in the script so it wasn't looping the number of times I thought. Now its fixed, I have reproduced the problem once so far. A first run of 100 attempts passed. A second run of 1000 failed somewhere around the 120th attempt. But all further tries with the 1000 attempt test have failed. I then upped the attempts to 10000 and finally got another failure. It took 419s which means it was about the 3000th attempt.
I welcome any suggestions you might have on how to improve the probability of failure.
Okay, I'll test it in my environment and report the results later.
@MarkCallow I tested the fix you provided, and the problem did not reappear in more than 1,000 tests. Before the fix, the problem would reappear in about 300 tests in my test environment. I think there are two areas for improvement:
- To ensure the test code can run correctly in environments older than C++20, I wrote a replacement method for
std::barrier, implemented as follows:
class Barrier {
private:
std::mutex mtx;
std::condition_variable cv;
int threshold;
int count;
int generation;
public:
explicit Barrier(int count) : threshold(count), count(count), generation(0) {}
void arrive_and_wait() {
std::unique_lock<std::mutex> lock(mtx);
int gen = generation;
if (--count == 0) {
generation++;
count = threshold;
cv.notify_all();
} else {
cv.wait(lock, [this, gen] { return gen != generation; });
}
}
};
- In the code below, a thread is still need hold lock
transcoderInitMutexeven after the basisu transcoder is initialized, and then release lock quickly after thetranscoderInitializedcondition is not met. This increases the possibility of multi-threaded contention for thetranscoderInitMutexmutex. In fact, we only need to holdtranscoderInitMutexwhen the basisu transcoder is not initialized.
static bool transcoderInitialized;
static std::mutex transcoderInitMutex;
transcoderInitMutex.lock();
if (!transcoderInitialized) {
// Another thread did not initialize the transcoder while we were waiting.
basisu_transcoder_init();
transcoderInitialized = true;
}
transcoderInitMutex.unlock();
The improvement idea is: when a thread requesting transcoderInitMutex, first check if the basisu transcoder has been initialized. If it has been initialized, there is no need to request transcoderInitMutex. This can completely avoid the contention state that occurs when calling ktxTexture2_TranscodeBasis in a multi-threaded environment. Of course, the type of transcoderInitialized also needs to be changed to std::atomic<bool> to ensure that the transcoderInitialized value read by each thread is the same after modifying transcoderInitialized. The improved code is as follows, and this solution has not encountered any problems after more than 1,000 tests.
static std::atomic<bool> transcoderInitialized(false);
static std::mutex init_mutex;
if (!transcoderInitialized.load(std::memory_order_acquire)) {
std::lock_guard<std::mutex> lock(init_mutex);
if (!transcoderInitialized.load(std::memory_order_relaxed)) {
basisu_transcoder_init();
transcoderInitialized.store(true, std::memory_order_release);
}
}
@vmwalker, thank you very much for your testing effort and report.
Thanks for the alternate barrier implementation but std::barrier is only used in the test program and I am happy for that to not require anything earlier than c++ 20 as it is available on all the platforms where we run our tests.
Doesn't changing transcoderInitialized to std::atomic<bool> effectively mean that an internal lock is required to access it? If so, there is no advantage over the original code which is simpler.
Please answer the question I raised about using an atomic.
I am trying to decide whether it is worth including the test I wrote, that is evolved from yours, in our test suite. It only very occasionally triggers the bug even when I run it 1000 times. As part of our CI we can't afford a test that takes many minutes to run so, if it is included, I'd have to limit it to 100 runs. Even at that level it takes a long time compared to other tests. What to you think?
@MarkCallow The implementation of std::atomic
This is a low-probability issue in a multi-threaded environment. I completely agree that we should avoid spending too much effort in test cases to completely avoid a particular issue that is about to be fixed. I think it's not worthwhile.