cyclonedds icon indicating copy to clipboard operation
cyclonedds copied to clipboard

Hitting an assert for version mismatch in ddsi_sertype.c

Open matte1 opened this issue 2 years ago • 3 comments

On 0.10.4 I'm running into an assert on line 172 of src/core/ddsi/src/ddsi_sertype.c. Publishing/subscribing has been working fine and I'm unable to figure out what the difference is between this new subscriber that I'm adding and other ones.

Modifying the source code

  printf("%s\n", type_name);
  printf("%ld %ld\n", sertype_ops->version, ddsi_sertype_v0);
  assert (sertype_ops->version == ddsi_sertype_v0);
  assert ((flags & ~(uint32_t)DDSI_SERTYPE_FLAG_MASK) == 0);

I get the following.

foo::bar_idl::Message
140027296467537 140027134458989
python3: external/cyclonedds_c/src/core/ddsi/src/ddsi_sertype.c:174: ddsi_sertype_init_flags: Assertion `sertype_ops->version == ddsi_sertype_v0' failed.

matte1 avatar Aug 30 '23 03:08 matte1

That's bizarre!

I cannot think of anything that changed in the code base that could reasonably explain it, but it does remind me of a problem that I once encountered at a customer. There, Cyclone was built as a static library, that static library was then linked into N > 1 shared libraries containing application code using DDS, then those N shared libraries were linked together into a single application process.

This situation resulted in multiple copies of some variables/functions and it all somehow magically somewhat worked, except not completely. The manifestation of the error when looking in a debugger was that some global variable had value A when in one place in the code, but value B when in another place. Maybe, just maybe, you're seeing something similar.

The problem with this thought is that it seems it should have occurred before the update as well. But perhaps you can have a look at how you build things and how it all fits together?

One thing you can do is look at what the run-time linker is doing. On Linux, that is ld.so.1 and it has a bunch of debugging features that you control via the environment. In this case, it'd be interesting to look at the symbol binding. The easiest is to do

LD_DEBUG_OUTPUT=ld-debug.txt LD_DEBUG=all python3 ...

and look for anything involving libddsc and ddsi_sertype_v0, but beware that dumping all output will probably yield a gigantic file.

eboasson avatar Aug 30 '23 11:08 eboasson

Thanks eboasson! This actually might be the issue. The comment about 0.10.4 was misleading, this happens on both 10.1 and 10.4 so its not related to the upgrade. I only noticed this when adding a new class that contained its own subscribers to another class. So maybe it is a linkining issue.

I don't quite understand how that all "magically works" but then has different global variables but I will look into and report back :)

matte1 avatar Aug 30 '23 12:08 matte1

The other notable thing about our setup is that we have a python wrapper that has cyclone dds pub/subs which calls C++ modules that also contain c++ versions of the pub/subs. This has worked for us for a long time so I don't suspect that but wanted to let you know.

matte1 avatar Aug 30 '23 12:08 matte1