cameleon
cameleon copied to clipboard
Crash due to TRAILER is unexpectedly appended to the end of PAYLOAD buffer
I'm testing Chinese machine vision camera with a stream example. Cameleon expects strictly one LEADER packet, followed by TRAILER packets. An unrecoverable issue is that Cameleon receives TRAILER packet instead of LEADER packet from this camera, which is not designed to happen.
What I know for sure is: the same camera has no issue streaming with Aravis.
I wish to collect more context, if you guide me how. I'm not a Rust programmer, and GDB debugger support for Rust is just awful: it's easier to hack into the memory with bare hands than to understand how to print out anything meaningful from the Rust variables...
I'm not a Rust programmer, and GDB debugger support for Rust is just awful: it's easier to hack into the memory with bare hands than to understand how to print out anything meaningful from the Rust variables...
Have you tried the rust-gdb wrapper? It should make debugging in GDB much easier. To get it it should be sufficient to install rust toolchain using rustup.
Oh yes, I use it, and I'm quite unhappy, because it still does not support containers evaluation. The things is most of the Rust data items are containers, like C++ STL. For example, I may often wish to print a C++ std::vector, but it cannot be printed directly, it's not an array. GDB has special treatment of std::vector to print it, though. That's what we still badly miss from rust-gdb, and what ruins the debugging experience.
Thread 3 "stream" hit Breakpoint 2, cameleon_device::u3v::protocol::stream::Trailer::parse_prefix (cursor=0x7ffff71fd188)
at device/src/u3v/protocol/stream.rs:451
451 Err(Error::InvalidPacket("invalid prefix magic".into()))
(gdb) bt
#0 cameleon_device::u3v::protocol::stream::Trailer::parse_prefix (cursor=0x7ffff71fd188) at device/src/u3v/protocol/stream.rs:451
#1 0x000055555563095d in cameleon_device::u3v::protocol::stream::Trailer::parse<alloc::vec::Vec<u8, alloc::alloc::Global>> (buf=0x7ffff71fd7d8)
at device/src/u3v/protocol/stream.rs:397
#2 0x00005555556084b2 in cameleon::u3v::stream_handle::StreamingLoop::run (self=...) at cameleon/src/u3v/stream_handle.rs:261
#3 0x0000555555610d6c in cameleon::u3v::stream_handle::{impl#1}::start_streaming_loop::{closure#1} () at cameleon/src/u3v/stream_handle.rs:113
#4 0x000055555565ab76 in std::sys::backtrace::__rust_begin_short_backtrace<cameleon::u3v::stream_handle::{impl#1}::start_streaming_loop::{closure_env#1}, ()>
(f=...) at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/sys/backtrace.rs:154
#5 0x0000555555634c1d in std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure#0}<cameleon::u3v::stream_handle::{impl#1}::start_streaming_loop::{closure_env#1}, ()> () at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/thread/mod.rs:522
#6 0x000055555566a0c4 in core::panic::unwind_safe::{impl#23}::call_once<(), std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<cameleon::u3v::stream_handle::{impl#1}::start_streaming_loop::{closure_env#1}, ()>> (self=...)
at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/panic/unwind_safe.rs:272
#7 0x000055555565af54 in std::panicking::try::do_call<core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<cameleon::u3v::stream_handle::{impl#1}::start_streaming_loop::{closure_env#1}, ()>>, ()> (data=0x7ffff71ffc30)
at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panicking.rs:554
#8 0x0000555555634ccb in __rust_try ()
#9 0x0000555555634819 in std::panicking::try<(), core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<cameleon::u3v::stream_handle::{impl#1}::start_streaming_loop::{closure_env#1}, ()>>> (f=...)
at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panicking.rs:518
#10 std::panic::catch_unwind<core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<cameleon::u3v::stream_handle::{impl#1}::start_streaming_loop::{closure_env#1}, ()>>, ()> (f=...)
at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panic.rs:345
#11 std::thread::{impl#0}::spawn_unchecked_::{closure#1}<cameleon::u3v::stream_handle::{impl#1}::start_streaming_loop::{closure_env#1}, ()> ()
at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/thread/mod.rs:521
#12 0x0000555555637aef in core::ops::function::FnOnce::call_once<std::thread::{impl#0}::spawn_unchecked_::{closure_env#1}<cameleon::u3v::stream_handle::{impl#1}::start_streaming_loop::{closure_env#1}, ()>, ()> () at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/ops/function.rs:250
#13 0x000055555577298b in alloc::boxed::{impl#48}::call_once<(), dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global> ()
at alloc/src/boxed.rs:2231
#14 alloc::boxed::{impl#48}::call_once<(), alloc::boxed::Box<dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global>, alloc::alloc::Global> ()
at alloc/src/boxed.rs:2231
#15 std::sys::pal::unix::thread::{impl#2}::new::thread_start () at std/src/sys/pal/unix/thread.rs:105
#16 0x00007ffff7d47144 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#17 0x00007ffff7dc77dc in ?? () from /lib/x86_64-linux-gnu/libc.so.6
This comment is wrong btw, should be "Parse bytes as Trailer":
impl<'a> Trailer<'a> {
const TRAILER_MAGIC: u32 = 0x5456_3355;
/// Parse bytes as Leader.
But the main problem is that the leader magic is given instead of the expected trailer magic:
(gdb) li
446 fn parse_prefix(cursor: &mut Cursor<&[u8]>) -> Result<()> {
447 let magic: u32 = cursor.read_bytes_le()?;
448 if magic == Self::TRAILER_MAGIC {
449 Ok(())
450 } else {
451 Err(Error::InvalidPacket("invalid prefix magic".into()))
452 }
453 }
454 }
455
(gdb) p/x magic
$1 = 0x4c563355
Btw, here is an example of how rust-gdb handles Rust-aware debugging:
(gdb) p/x Self::TRAILER_MAGIC
No symbol 'Self::TRAILER_MAGIC' in current context
(gdb) p/x Self::LEADER_MAGIC
No symbol 'Self::LEADER_MAGIC' in current context
So we have to go to the source code and see that the p/x magic above indeed prints the leader magic:
const TRAILER_MAGIC: u32 = 0x5456_3355;
const LEADER_MAGIC: u32 = 0x4C56_3355;
The panic happens because "trailer must follow the leader" logic is hardcoded in Cameleon streaming:
(gdb) li 252
247 // We received the data from the bulk transfers, try to parse stuff now.
248 let leader = match u3v_stream::Leader::parse(&leader_buf)
249 .map_err(|e| StreamError::InvalidPayload(format!("{}", e).into()))
250 {
251 Ok(leader) => leader,
252 Err(err) => {
253 warn!(?err);
254 // Reuse `payload_buf`.
255 payload_buf_opt = Some(payload_buf);
256 self.sender.try_send(Err(err)).ok();
(gdb)
257 continue;
258 }
259 };
260
261 let trailer = match u3v_stream::Trailer::parse(&trailer_buf)
262 .map_err(|e| StreamError::InvalidPayload(format!("invalid trailer: {}", e).into()))
263 {
264 Ok(trailer) => trailer,
265 Err(err) => {
266 warn!(?err);
(gdb)
267 // Reuse `payload_buf`.
268 payload_buf_opt = Some(payload_buf);
269 self.sender.try_send(Err(err)).ok();
270 continue;
271 }
272 };
See: the only expected packet after LEADER is TRAILER.
I have no clue why it is incorrect, so I take Aravis as a reference. And it appears that in Aravis camera streaming loop LEADER is followed by DATA, not by TRAILER:
Looking for the first available camera
vendor name = Machine Vision
model name = MV-A5031MU815
device id = (null)
image width = 640
image height = 480
horizontal binning = 0
vertical binning = 0
payload = 307200 bytes
exposure = 1000 µs
gain = 0 dB
ARV_UVSP_PACKET_TYPE_LEADER
ARV_UVSP_PACKET_TYPE_DATA
ARV_UVSP_PACKET_TYPE_TRAILER
ARV_UVSP_PACKET_TYPE_LEADER
ARV_UVSP_PACKET_TYPE_DATA
ARV_UVSP_PACKET_TYPE_TRAILER
Cameleon actually has the same data reading sequence as Aravis:
if let Err(err) = read_leader(&mut async_pool, &self.params, &mut leader_buf) {
// Report and send error if the error is fatal.
if matches!(err, StreamError::Io(..) | StreamError::Disconnected) {
error!(?err);
self.sender.try_send(Err(err)).ok();
}
payload_buf_opt = Some(payload_buf);
continue;
};
if let Err(err) = read_payload(&mut async_pool, &self.params, &mut payload_buf) {
warn!(?err);
// Reuse `payload_buf`.
payload_buf_opt = Some(payload_buf);
self.sender.try_send(Err(err)).ok();
continue;
};
if let Err(err) = read_trailer(&mut async_pool, &self.params, &mut trailer_buf) {
warn!(?err);
// Reuse `payload_buf`.
payload_buf_opt = Some(payload_buf);
self.sender.try_send(Err(err)).ok();
continue;
};
This exactly corresponds to this sequence in Aravis:
ARV_UVSP_PACKET_TYPE_LEADER
ARV_UVSP_PACKET_TYPE_DATA
ARV_UVSP_PACKET_TYPE_TRAILER
So my guess is that read_payload() does not properly stop reading at the end of the DATA packet, and takes over the TRAILER as well. This way the next packet start, which is supposed to be TRAILER becomes LEADER.
Actually, it's even more wrong than my guess:
leader_bufstarts with (85, 51, 86, 84), which isTRAILER_MAGIC: u32 = 0x5456_3355payload_bufstarts with (85, 51, 86, 76), which isLEADER_MAGIC: u32 = 0x4C56_3355trailer_bufstarts with (85, 51, 86, 84), which isTRAILER_MAGIC: u32 = 0x5456_3355
It's all messed up, see the GDB below:
Thread 3 "stream" hit Breakpoint 2, cameleon::u3v::stream_handle::StreamingLoop::run (self=...) at cameleon/src/u3v/stream_handle.rs:245
245 let payload_len = payload_len - last_buf_len.unwrap();
(gdb) p leader_buf
$4 = Vec(size=52) = {85, 51, 86, 84, 0, 0, 32, 0, 145, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 176, 4, 0, 0, 0, 0, 0, 224, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
(gdb) p payload_buf
$5 = Vec(size=614400) = {85, 51, 86, 76, 0, 0, 52, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 136, 102, 114, 231, 19, 3, 0, 0, 1, 0, 8, 1, 128, 2, 0, 0, 224, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0...}
(gdb) p trailer_buf
$6 = Vec(size=52) = {85, 51, 86, 84, 0, 0, 32, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 176, 4, 0, 0, 0, 0, 0, 224, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
(gdb) p 0x4C
$7 = 76
(gdb) p 0x56
$8 = 86
(gdb) p 0x33
$10 = 51
(gdb) p 0x55
$11 = 85
It seems that I'm able to fix this issue with the following 2 changes:
- I revert the patch, which puts leader, payload and trailer transfers into the same async buffer:
commit a371076bdbdffa25a654d320fe5e1bbf958b2637
Author: Brian Schwind <[email protected]>
Date: Thu May 11 22:49:45 2023 +0900
Use async transfers for the leader and trailer buffers
@bschwind , in your a371076bdbdffa25a654d320fe5e1bbf958b2637 I think you are trying to make the USB bulk transfers as async as possible, for good. That seems to work, but there is one side effect: the bulk transfers may come out shuffled: LEADER buffer is not necessary the first in the sequence of captured buffers, and so on. Perhaps, you were lucky that shuffling did not happen on your system, and the LEADER-DATA-TRAILER order is preserved in the async pool. I propose to revert your patch an re-apply it again with a proper identification of the buffer kind after async polling is finished. What do you think?
- In
read_payload()I disablepayload_final2_size, which exceeds the image size and provokes a failure, becauseparams.payload_final1_sizeis already the size of the image:
- if params.payload_final2_size != 0 {
- async_pool.submit(&mut buf[cursor..cursor + params.payload_final2_size])?;
+ //if params.payload_final2_size != 0 {
+ // async_pool.submit(&mut buf[cursor..cursor + params.payload_final2_size])?;
+ //}
I have no idea what are those payload_final2_size1 and payload_final2_size2 are. But it seems that if we set the payload size to a larger value than the frames size, the trailer packet is also appended to the payload transfer, and we have no way to determine how many bytes of data belong to the image frame and how many belong to the trailer package. But setting max transfer size to frame size (which is effectively the value of payload_final2_size1) makes the trailer to be received in another separate bulk transfer.
Correction: the first point above about async bulk transfers has no negative effect. Only the second point about the payload size is enough to fix this issue!
I've drafted #197 to discuss a possible fix with you.
I'll give this a look - it's definitely important to have the bulk transfers pipelined as much as possible to attain the throughput you need for higher frame rates. I'll have to double check the current code and its possibilities for shuffling or unnecessary transfers. It's also possible one of our cameras isn't following the standard, but yeah give me some time to read up again on these things.
Thanks, @bschwind ! You can safely ignore the whole investigation here and proceed to #197, which is the only issue to fix.