apollo icon indicating copy to clipboard operation
apollo copied to clipboard

Writer can't acquire write lock

Open realSinged opened this issue 1 year ago • 3 comments

Describe the bug

we send video packet (tens to hundreds of KB)throught cyber with one writer and two readers. the frequency is about 400hz. writer can't acquire write lock.

Run information

  • cyber version: 8.0.0
  • run in docker
  • ubuntu18.04
  • hardware information image

To Reproduce

sorry,it is random but always happen when writer and reader run for a long enough period(one day for example)

image
  1. log of call writer.Write in business code.
  2. cyber log of open shm
  3. cyber log of open shm
  4. repeat log for can't acquire write lock.

from log above,we located code here:

https://github.com/ApolloAuto/apollo/blob/r8.0.0/cyber/transport/shm/block.cc#L38 image

it seem that all block has read or write lock and never release. but we don't know why and how it happened. anybody can help?really appreciate.

realSinged avatar Mar 25 '24 08:03 realSinged

Could you provide a demo that reproduces this issue? I can take some time to solve it.

linmianhao avatar Mar 28 '24 06:03 linmianhao

hi,the snippet is in our biz code. but I already know how to reproduce it.

  1. create a writer write to channel with high frenquency.
  2. create a reader read from this channel.
  3. make writer and reader exit abnormal(for example in anthoer thread)
  4. make sure reader and writer always restart automatically.
  5. run for a while

then you will find some block's block num is -1 or 1 forever.

image reader may not release lock if process quit when reader thread execute between AcquireBlockToRead and ReleaseReadBlock. image and so does the writer.

realSinged avatar Apr 01 '24 02:04 realSinged

@daohu527

realSinged avatar Apr 01 '24 03:04 realSinged