dragonfly icon indicating copy to clipboard operation
dragonfly copied to clipboard

corruption in the stream code

Open vyavdoshenko opened this issue 7 months ago • 5 comments

The original idea: The idea is based on the fact that the stream code writes somehow lp size using lpSetTotalBytes into already freed block. Before lpSetTotalBytes corrupts the memory, it should have low 32 bits of the "next" field and based on the examples we have it should be pretty large number or 0 (in case block->next is null).

in any case we read first with lpGetTotalBytes and if it is say 0 - it's a bug (we crash) because lp can not have 0 bytes (it has at least 7 bytes meta data) and if it's higher 16MB (highly unlikely) we call zmalloc_usable_size and check that the usable size is equal or higher than lpGetTotalBytes (consistency). and if not we abort (crash)

The PR: https://github.com/dragonflydb/dragonfly/pull/5252

The original crash log:

@     0xaaaaab3b2f18        192  boost::context::detail::fiber_entry<>() |  
-- | --
  |   | @     0xaaaaab3b291c        576  util::ListenerInterface::RunSingleConnection() |  
  |   | @     0xaaaaaafd9a64        368  facade::Connection::HandleRequests() |  
  |   | @     0xaaaaaafd8b18        272  facade::Connection::ConnectionFlow() |  
  |   | @     0xaaaaaafd85ac       1680  facade::Connection::IoLoop() |  
  |   | @     0xaaaaaafd76bc        304  facade::Connection::ParseRedis() |  
  |   | @     0xaaaaaafd716c        464  facade::Connection::DispatchSingle() |  
  |   | @     0xaaaaaac18914        672  dfly::Service::DispatchCommand() |  
  |   | @     0xaaaaaac12aa4        208  dfly::Service::InvokeCmd() |  
  |   | @     0xaaaaaaeea4d4        672  dfly::CommandId::Invoke() |  
  |   | @     0xaaaaaace88d8         32  dfly::StreamFamily::XAdd() |  
  |   | @     0xaaaaaaf09fe0         64  dfly::Transaction::ScheduleSingleHop() |  
  |   | @     0xaaaaaaf09f48        416  dfly::Transaction::Execute() |  
  |   | @     0xaaaaaaf09684        592  dfly::Transaction::ScheduleInternal() |  
  |   | @     0xaaaaaaf082f0        288  dfly::Transaction::ScheduleInShard() |  
  |   | @     0xaaaaaaf07b08        208  dfly::Transaction::RunCallback() |  
  |   | @     0xaaaaaacefebc       1344  absl::lts_20240722::functional_internal::InvokeObject<>() |  
  |   | @     0xaaaaaacef0b8         32  dfly::(anonymous namespace)::OpAdd() |  
  |   | @     0xaaaaab00866c         32  lpNew |  
  |   | @     0xfffff7ffb8f8       4960  (unknown) |  
  |   | @     0xaaaaab5ec214        480  absl::lts_20240722::AbslFailureSignalHandler() |  
  |   | PC: @     0xaaaaab01f298  (unknown)  _mi_page_malloc_zero |  
  |   | *** SIGSEGV received at time=1749307602 on cpu 0 ***

vyavdoshenko avatar Jun 09 '25 08:06 vyavdoshenko

I can be wrong, but it seems to me that https://github.com/dragonflydb/dragonfly/issues/5202 may be relevant to this one.

vyavdoshenko avatar Jun 12 '25 14:06 vyavdoshenko

I see that you merged this PR: #5252. Is this bug fixed? What is the status of this issue?

BagritsevichStepan avatar Jun 15 '25 15:06 BagritsevichStepan

I can be wrong, but it seems to me that #5202 may be relevant to this one.

Why do you think so?

BagritsevichStepan avatar Jun 15 '25 15:06 BagritsevichStepan

@BagritsevichStepan This is my opinion based on testing. I couldn't reproduce this particular issue with fuzzing and script testing. But https://github.com/dragonflydb/dragonfly/issues/5202 causes so many times for the stream family. This is my assumption that they can be relevant.

vyavdoshenko avatar Jun 15 '25 16:06 vyavdoshenko

#5202 is issue about memory usage tracking. Memory usage tracking should not touch other logic. We never use the stream size in stream_family.cc. And we do not free memory based on memory usage values. I don't see how this bug can be related. It should be another issue

BagritsevichStepan avatar Jun 15 '25 16:06 BagritsevichStepan