kotlinx.coroutines icon indicating copy to clipboard operation
kotlinx.coroutines copied to clipboard

macOS Test fails with unknown error when upgrading to 1.6.x

Open jurmous opened this issue 2 years ago • 2 comments

When I upgrade from coroutines 1.5.2 to 1.6.4 (or any 1.6 release), certain tests of our project fails on an unknown error.

Repository: https://github.com/marykdb/maryk/tree/update_coroutines

Gradle command for relevant tests: ./gradlew :store-rocksdb:macosArm64Test (Or macosX64Test if tested on Intel machine)

Thrown exception

> Task :store-rocksdb:macosArm64Test FAILED

maryk.datastore.rocksdb.RocksDBDataStoreMigrationTest.testMigrationWithIndex FAILED
    Unknown

2 tests completed, 1 failed
FAILURE: Build failed with an exception.
* What went wrong:
Execution failed for task ':store-rocksdb:macosArm64Test'.
> Test running process exited unexpectedly.
  Current test: testMigrationWithIndex
  • If test is run standalone, it succeeds.
  • It works on JVM
  • If project is downgraded to 1.5.2 it succeeds (See master: https://github.com/marykdb/maryk/tree/update_coroutines)
  • If test is commented out, a next test will crash
  • If printlns are added, it seems the crash location seems to be different on different runs.

So likely this unknown crash likely points to a lower level Coroutines issue in the Native/Macos layer.

jurmous avatar Jul 14 '22 19:07 jurmous

Thanks, we'll take a look at some point closer to the next release. It would be really great if you could pinpoint the issue, at least partially, to speed-up the process for us, but it's definitely not necessary

qwwdfsad avatar Jul 15 '22 10:07 qwwdfsad

With the above mentioned issues like the tests not failing if run standalone and no helping feedback with the unknown error and moving fail points when debugging. I can unfortunately not further pinpoint the issue. If there is a way to get more feedback from the test runner, that would be great. Thanks already for looking into it closer to the release!

jurmous avatar Jul 16 '22 11:07 jurmous

It seems to be https://youtrack.jetbrains.com/issue/KT-53243/Native-data-race-in-lazy-initialization fixed in 1.7.20. Please update to 1.7.20 when it's released (or 1.7.20-RC if that works for you) and the issue should be gone

qwwdfsad avatar Sep 13 '22 10:09 qwwdfsad

@qwwdfsad I tested with 1.7.20 release and I still get the same unknown error/segmentation fault:

[ RUN      ] maryk.datastore.rocksdb.RocksDBDataStoreMigrationTest.testMigrationWithIndex
[1]    74229 segmentation fault  ./build/bin/macosArm64/debugTest/test.kexe

So I am afraid this issue needs to be reopened..

jurmous avatar Oct 01 '22 15:10 jurmous

Hi!

I had a small look on your segfault using lldb, and it shows a crash inside your ObjectiveC code, not inside kotlin code.

  * frame #0: 0x0000000100f7b6f9 test.kexe`std::__1::__shared_ptr_pointer<rocksdb::TablePropertiesCollectorFactory*, std::__1::shared_ptr<rocksdb::TablePropertiesCollectorFactory>::__shared_ptr_default_delete<rocksdb::TablePropertiesCollectorFactory, rocksdb::TablePropertiesCollectorFactory>, std::__1::allocator<rocksdb::TablePropertiesCollectorFactory> >::__on_zero_shared() [inlined] std::__1::default_delete<rocksdb::TablePropertiesCollectorFactory>::operator(this=<unavailable>, __ptr=0x00007b10000075c0)(rocksdb::TablePropertiesCollectorFactory*) const at unique_ptr.h:57:5 [opt]
    frame #1: 0x0000000100f7b6f0 test.kexe`std::__1::__shared_ptr_pointer<rocksdb::TablePropertiesCollectorFactory*, std::__1::shared_ptr<rocksdb::TablePropertiesCollectorFactory>::__shared_ptr_default_delete<rocksdb::TablePropertiesCollectorFactory, rocksdb::TablePropertiesCollectorFactory>, std::__1::allocator<rocksdb::TablePropertiesCollectorFactory> >::__on_zero_shared(this=<unavailable>) at shared_ptr.h:267:5 [opt]
    frame #2: 0x0000000100f91783 test.kexe`std::__1::shared_ptr<rocksdb::PersistentCache>::~shared_ptr() [inlined] std::__1::__shared_count::__release_shared(this=0x00007b0800009a00) at shared_ptr.h:177:9 [opt]
    frame #3: 0x0000000100f9177a test.kexe`std::__1::shared_ptr<rocksdb::PersistentCache>::~shared_ptr() [inlined] std::__1::__shared_weak_count::__release_shared(this=0x00007b0800009a00) at shared_ptr.h:219:27 [opt]
    frame #4: 0x0000000100f9177a test.kexe`std::__1::shared_ptr<rocksdb::PersistentCache>::~shared_ptr(this=0x00007b9800000ad0) at shared_ptr.h:959:19 [opt]
    frame #5: 0x00000001011c1b55 test.kexe`rocksdb::ColumnFamilyOptions::~ColumnFamilyOptions() [inlined] std::__1::shared_ptr<rocksdb::SliceTransform const>::~shared_ptr(this=<unavailable>) at shared_ptr.h:957:1 [opt]
    frame #6: 0x00000001011c1b50 test.kexe`rocksdb::ColumnFamilyOptions::~ColumnFamilyOptions(this=0x00007b9800000888) at options.h:65:8 [opt]
    frame #7: 0x0000000101041ffa test.kexe`void std::__1::allocator_traits<std::__1::allocator<rocksdb::ColumnFamilyDescriptor> >::destroy<rocksdb::ColumnFamilyDescriptor, void>(std::__1::allocator<rocksdb::ColumnFamilyDescriptor>&, rocksdb::ColumnFamilyDescriptor*) [inlined] rocksdb::ColumnFamilyOptions::~ColumnFamilyOptions(this=<unavailable>) at options.h:65:8 [opt]
    frame #8: 0x0000000101041ff5 test.kexe`void std::__1::allocator_traits<std::__1::allocator<rocksdb::ColumnFamilyDescriptor> >::destroy<rocksdb::ColumnFamilyDescriptor, void>(std::__1::allocator<rocksdb::ColumnFamilyDescriptor>&, rocksdb::ColumnFamilyDescriptor*) [inlined] rocksdb::ColumnFamilyDescriptor::~ColumnFamilyDescriptor(this=0x00007b9800000870) at db.h:72:8 [opt]
    frame #9: 0x0000000101041ff1 test.kexe`void std::__1::allocator_traits<std::__1::allocator<rocksdb::ColumnFamilyDescriptor> >::destroy<rocksdb::ColumnFamilyDescriptor, void>(std::__1::allocator<rocksdb::ColumnFamilyDescriptor>&, rocksdb::ColumnFamilyDescriptor*) [inlined] rocksdb::ColumnFamilyDescriptor::~ColumnFamilyDescriptor(this=0x00007b9800000870) at db.h:72:8 [opt]
    frame #10: 0x0000000101041ff1 test.kexe`void std::__1::allocator_traits<std::__1::allocator<rocksdb::ColumnFamilyDescriptor> >::destroy<rocksdb::ColumnFamilyDescriptor, void>(std::__1::allocator<rocksdb::ColumnFamilyDescriptor>&, rocksdb::ColumnFamilyDescriptor*) [inlined] std::__1::allocator<rocksdb::ColumnFamilyDescriptor>::destroy(this=<unavailable>, __p=0x00007b9800000870) at allocator.h:159:15 [opt]
    frame #11: 0x0000000101041ff1 test.kexe`void std::__1::allocator_traits<std::__1::allocator<rocksdb::ColumnFamilyDescriptor> >::destroy<rocksdb::ColumnFamilyDescriptor, void>(__a=<unavailable>, __p=0x00007b9800000870) at allocator_traits.h:309:13 [opt]
    frame #12: 0x0000000101041c8a test.kexe`-[RocksDBColumnFamilyDescriptor dealloc] [inlined] std::__1::__vector_base<rocksdb::ColumnFamilyDescriptor, std::__1::allocator<rocksdb::ColumnFamilyDescriptor> >::__destruct_at_end(this=0x00007b0800009360, __new_last=0x00007b9800000000) at vector:450:9 [opt]
    frame #13: 0x0000000101041c72 test.kexe`-[RocksDBColumnFamilyDescriptor dealloc] [inlined] std::__1::__vector_base<rocksdb::ColumnFamilyDescriptor, std::__1::allocator<rocksdb::ColumnFamilyDescriptor> >::clear(this=0x00007b0800009360) at vector:374:29 [opt]
    frame #14: 0x0000000101041c6f test.kexe`-[RocksDBColumnFamilyDescriptor dealloc] [inlined] std::__1::vector<rocksdb::ColumnFamilyDescriptor, std::__1::allocator<rocksdb::ColumnFamilyDescriptor> >::clear(this=0x00007b0800009360 size=9) at vector:796:17 [opt]
    frame #15: 0x0000000101041c6f test.kexe`-[RocksDBColumnFamilyDescriptor dealloc](self=<unavailable>, _cmd=<unavailable>) at RocksDBColumnFamilyDescriptor.mm:44:21 [opt]
        frame #16: 0x0000000100f7713d test.kexe`-[NSObject(self=0x00007b04000002b0, _cmd=<unavailable>, mode=<unavailable>) releaseAsAssociatedObject:] at ObjCExportClasses.mm:183:3 [opt]

This happens when the garbage collector thinks that object is not needed anymore by kotlin code. You can check this releaseAsAssociatedObject here, but in fact, it is just a release call.

One of the differences with old versions is that with the old garbage collector, the release was always called on the same thread, while with new one, it could be called on another one. Can't this lead to some problems with C++/ObjC libraries? Maybe there is some data race in one of the destructors above in that case?

kunyavskiy avatar Oct 01 '22 21:10 kunyavskiy