clover icon indicating copy to clipboard operation
clover copied to clipboard

docs: Add note about long running callbacks

Open sfalexrog opened this issue 5 years ago • 6 comments

Having a long running callback will result in messages being handled with a significant delay, regardless of the queue_size option. This is especially noticeable for image processing. We should at the very least document that.

This P/R adds a note about using separate threads for image processing, as well as some minimal sample code.

sfalexrog avatar Feb 20 '20 14:02 sfalexrog

Okay, I believe I see your point.

Using a Queue definitely sounds like overkill, although I do like it being a container - that way we don't need two entities for synchronization.

Regarding your example: it does look simplier, but I still have some issues with it:

  • latest_img is only created in the callback. This makes it much less discoverable (I don't think a lot of our users know that global keyword can create global variables, not simply reference them; I don't believe this is something that is taught in the introductory Python course);
  • there is no explicit thread creation; this makes the code seem simplier (and I've thought about doing the same, since the callbacks already arrive in a separate thread), but, well, "explicit is better than implicit".

As for your last point, well, the "best" way would be to do something similar to how OpenCV does that. Most people doing CV-related stuff have already learned some patterns, and this is where your example comes close: having a "main loop" with all image processing is nice (although that would probably lead to mixing image processing code and flight code, which I believe should be discouraged).

Actually, your last point brings up our docs' overall deficiency: we don't have a preferred way of writing complex flight code.

sfalexrog avatar Feb 21 '20 12:02 sfalexrog

К слову: у нас больше проблем с производительностью обработки картинок вызывал не threading, а то, что на питоне не работает image_transport (http://wiki.ros.org/image_transport#Python_Usage), что вызывает задержку до 0.1 сек при прокачке кадра через рос топик. Спасало прямое использование OpenCv VideoCapture... Возможно, ещё какие способы решения этой проблемы известны?...

copterspace avatar Feb 21 '20 13:02 copterspace

I investigated the issue a little bit more. The "problem" is actually in this code: https://github.com/ros/ros_comm/blob/noetic-devel/clients/rospy/src/rospy/impl/tcpros_base.py#L795.

The client code does correct thing cutting the latest messages from the buffer (https://github.com/ros/ros_comm/blob/noetic-devel/clients/rospy/src/rospy/msg.py#L225) to process.

But since reading the socket and calling callbacks perform in the same thread (at least for one TCPROSTransport, which apparently means one subscriber+publisher pair), callback function blocks the thread from reading the socket for the certain time. It's OK for small messages, but for large messages (images) it simply overflows the internal OS TCP socket buffer (https://stackoverflow.com/questions/3205953/how-does-a-linux-socket-buffer-overflow). Therefore, the latest packets get dropped by the kernel.

By the way, Linux TCP buffer size can be inspected in /proc/sys/net/ipv4/tcp_rmem, and also can be changed via setsockopt.

okalachev avatar Feb 26 '20 00:02 okalachev

@copterspace, сомневаюсь, что дело именно в большом времени передачи изображения, а не в той проблеме, которую мы обсуждаем.

Когда я получаю изображения с камеры с помощью воркэраунда выше, то измеряемое время передачи на порядки меньше, чем 0.1с.

okalachev avatar Feb 26 '20 00:02 okalachev

Actually, I would call that a bug, which can be simply avoided using separate threads for reading the socket and invoking the callbacks. It deserves issuing and fixing in rospy.

okalachev avatar Feb 26 '20 00:02 okalachev

Opened an issue in ros_comm: https://github.com/ros/ros_comm/issues/1901 Test setup is available at https://github.com/sfalexrog/staleness_test

sfalexrog avatar Feb 27 '20 13:02 sfalexrog