python-can
python-can copied to clipboard
Socketcan uses incorrect flag for direction
The socketcan implementation is using MSG_DONTROUTE flag instead of MSG_CONFIRM flag to set the is_rx message field. This breaks applications where virtual can interface is used, because all messages originate from the same host and thus is_rx is always false. IMO the correct behavior should be to use MSG_CONFIRM flag instead, but this could break some existing applications if they use two instances of the same socket simultaneously. Opinions?
MSG_DONTROUTE and MSG_CONFIRM docs (section 4.1.7): https://www.kernel.org/doc/Documentation/networking/can.txt
The socketcan implementation is using
MSG_DONTROUTEflag instead ofMSG_CONFIRMflag to set theis_rxmessage field. This breaks applications where virtual can interface is used, because all messages originate from the same host and thusis_rxis always false.
But this is the correct status. The difference is
- is the CAN frame created in that host
- is the CAN frame received from the outside (e.g. received by a real CAN interface)
As vcan traffic can only occur when it is created on the host, vcan traffic is always "TX" type traffic.
IMO the correct behavior should be to use
MSG_CONFIRMflag instead, but this could break some existing applications if they use two instances of the same socket simultaneously. Opinions?
You correctly point out the problem. MSG_CONFIRM is set when the tx traffic is received by exactly the socket it has been sent on. When you have a second socket (e.g. for candump) reading that traffic the MSG_CONFIRM flag is never set on this 'receiving-only' socket.
Kernel code:
*pflags = 0;
if (oskb->sk) // <= only sbks generated by a socket have this sk value set
*pflags |= MSG_DONTROUTE; // => this is local generated traffic
if (oskb->sk == sk) // <= check if it was exactly me who generated that skb
*pflags |= MSG_CONFIRM; // => yes. My outgoing skb has been echo'ed to me
MSG_DONTROUTEandMSG_CONFIRMdocs (section 4.1.7): https://www.kernel.org/doc/Documentation/networking/can.txt
Edit: To sum it up: There is no bug IMO.
I agree with what you are saying, but I can't think of a use case where the current behavior is desired.
Most of the applications use the is_rx = false messages to verify that the message was transmitted successfully and/or to get a precise transmission timestamp. So it makes sense that is_rx would only be false for messages transmitted on the current socket. I don't think anyone has encountered this problem yet, because when using a single socket with a single CAN adapter (the common case) there is no difference between MSG_CONFIRM and MSG_DONTROUTE. However, there is a problem when virtual can interface is used for unit tests as it is impossible to simulate outside traffic, all messages are is_rx = false.
I don't see any use case where the current behavior can be useful, but please tell me if there is something I don't know.
Any traffic which comes from a virtual CAN interface has to be sent to this virtual CAN interface. And this can only be done from the local host.
The problem is, that the sk value (see code in my last post) is used as an identifier for this and removing this tag would break almost everything.
But if having this tag is_rx = true is that important for your unit test, I can probably provide a solution using the can-gw:
Run cangw -A -s vcan0 -d vcan1 -e (as root)
When running cansend vcan0 123#1122 in a separate terminal you can see this with candump -x -td any:
$ candump -x -td any
(000.000000) vcan0 TX - - 123 [2] 11 22
(000.000070) vcan1 RX - - 123 [2] 11 22
(000.588690) vcan0 TX - - 123 [2] 11 22
(000.000026) vcan1 RX - - 123 [2] 11 22
(000.647955) vcan0 TX - - 123 [2] 11 22
(000.000026) vcan1 RX - - 123 [2] 11 22
(000.499139) vcan0 TX - - 123 [2] 11 22
(000.000073) vcan1 RX - - 123 [2] 11 22
When you then want to 'replay' a logfile to the virtual CAN that creates the desired flags:
- send TX frames to
vcan1 - send RX frames to
vcan0
Thanks for the workaround, I will keep that as an option for unit tests.
However, I think that we still have some kind of miscommunication. What I'm proposing is: change is_rx = not bool(msg_flags & socket.MSG_DONTROUTE) to is_rx = not bool(msg_flags & socket.MSG_CONFIRM) at https://github.com/hardbyte/python-can/blob/3b47d42f38c7ad5adea5c78bc482df9f6c106dac/can/interfaces/socketcan/socketcan.py#L565
The current behavior: is_rx indicates that message arrived from a different machine.
The suggested behavior: is_rx indicates that message arrived from a different machine or socket (application).
Or in other words, is_rx = false should be a confirmation of correct transmission for the application, but instead it is currently some weird information about kernel-level message routing.
The problem is, that the
skvalue (see code in my last post) is used as an identifier for this and removing this tag would break almost everything.
Could you elaborate? I know that my suggestion is a breaking change, but so far I haven't seen a use case for current behavior. Moreover, the API has a receive_own_messages parameter for socketcan bus, but there is no way to distinguish these messages, because MSG_CONFIRM flag is not exposed. It would be an option to add additional is_confirm field to Message object, but if the current MSG_DONTROUTE flag has no use, then changing is_rx behavior would keep the API more concise.
And this is not only a problem with vcan. If two applications are using one physical CAN interface, they cannot distinguish if is_rx = false message was sent by this or the other application. Sure you could check if messages are equal, but this is not robust, because they do not have any unique ID and collisions are possible.
However, I think that we still have some kind of miscommunication. What I'm proposing is: change
is_rx = not bool(msg_flags & socket.MSG_DONTROUTE)tois_rx = not bool(msg_flags & socket.MSG_CONFIRM)athttps://github.com/hardbyte/python-can/blob/3b47d42f38c7ad5adea5c78bc482df9f6c106dac/can/interfaces/socketcan/socketcan.py#L565
The current behavior:
is_rxindicates that message arrived from a different machine. The suggested behavior:is_rxindicates that message arrived from a different machine or socket (application).
Ah, now I got it. So that every CAN frame that is not sent by your own socket/application get's an "RX" tagging, right?
So this would move the rx/tx status from the host perspective to the application perspective - as long as you use exactly one socket in your application. This change should be feasible but I'm not sure about the expectations of other python-can users on this topic.
Or in other words,
is_rx = falseshould be a confirmation of correct transmission for the application, but instead it is currently some weird information about kernel-level message routing.
The question is whether we should introduce something new that makes it clear, that we have an confirmation frame (aka echo'ed frame) here.
The problem is, that the
skvalue (see code in my last post) is used as an identifier for this and removing this tag would break almost everything.Could you elaborate? I know that my suggestion is a breaking change, but so far I haven't seen a use case for current behavior.
Maybe there's a miscommunication in the other direction too.
The reason for the message flags is described here: https://www.kernel.org/doc/html/latest/networking/can.html#raw-socket-option-can-raw-recv-own-msgs
And there were good reasons and use-cases to do it this way - including weird use-cases from people that control cranes with it. The kernel API is written into stone.
Moreover, the API has a
receive_own_messagesparameter for socketcan bus, but there is no way to distinguish these messages, becauseMSG_CONFIRMflag is not exposed. It would be an option to add additionalis_confirmfield toMessageobject, but if the currentMSG_DONTROUTEflag has no use, then changingis_rxbehavior would keep the API more concise.
Yes, as suggested above adding something like is_confirm could make sense - especially due to the fact that the phython-can API provides such a receive_own_messages parameter.
Yes, as suggested above adding something like
is_confirmcould make sense - especially due to the fact that the phython-can API provides such areceive_own_messagesparameter.
@chemicstry would you like to provide a patch for such an improvement?
I could do that.
However, since python-can is supposed to be interface-agnostic, it would be wise to firstly define what is_rx and is_confirm mean at the API level for all interfaces. Since is_rx and is_confirm have inverted logic, it is easier to define is_tx = !is_rx and specify them as:
is_tx: Message was sent by any instance (socket)is_confirm: Message was sent by this instance (socket)
For interfaces that do not support multiple simultaneous access (most physical adapters without socketcan), it is always true that is_tx == is_confirm. Note that I'm using is_tx just to simplify the definition, we can leave is_rx as is in the API.
If this seems OK, I will make a PR, hopefully sometime next week.
For interfaces that do not support multiple simultaneous access (most physical adapters without socketcan), it is always true that
is_tx == is_confirm.
I don't think so.
The feature whether a CAN interfaces 'echoes' the sent CAN frame or not does not necessarily stick on socketcan.
E.g. you are currently working on the GS_USB interface - and IIRC that interface supports the 'echo' of sent CAN frames on the CAN interface itself.
So the receive_own_messages switch is not limited to socketcan IMO.
And only when receive_own_messages is switched on, the is_confirm CAN frames might appear, right?
Yeah, so if loopback/echo frames are enabled, is interface-specific. Some interfaces don't support it at all, some have it enabled by default (gs_usb), some have parameters to enable it (socketcan).
However, regardless if they are enabled, for non-shared interfaces is_confirm == !is_rx invariant always holds. For example, gs_usb interface would have is_confirm = frame.echo_id != GS_USB_NONE_ECHO_ID and is_rx = not is_confirm. If you disable echo frames (functionality not exposed for gs_usb yet), then is_confirm is always false and is_rx is always true.
Only shared interfaces (i.e. socketcan) could have is_rx == false and is_confirm == false, when on the same machine frame is sent from one socket and observed on the other.
EDIT: To complete the example, socketcan would have:
is_rx = not bool(msg_flags & socket.MSG_DONTROUTE)is_confirm = bool(msg_flags & socket.MSG_CONFIRM)
Yeah, so if loopback/echo frames are enabled, is interface-specific. Some interfaces don't support it at all, some have it enabled by default (gs_usb), some have parameters to enable it (socketcan).
Is this interface to switch the receive_own_messages setting commonly known?
So that it is fixed off for most CAN interfaces, fixed on for GS_USB and configurable for socketcan?
IMO this has to be clear to the user as they could get different content.
However, regardless if they are enabled, for non-shared interfaces
is_confirm == !is_rxinvariant always holds.
Really? is_confirm == !is_rx means is_confirm == is_tx so MSG_DONTROUTE == MSG_CONFIRM ?
The term 'non-shared interfaces' is misleading IMO. We should define the interface capabilities by how receive_own_messages is controlled or fixed.
EDIT: To complete the example, socketcan would have:
* `is_rx = not bool(msg_flags & socket.MSG_DONTROUTE)` * `is_confirm = bool(msg_flags & socket.MSG_CONFIRM)`
Yes - this is what I would expect as implementation in socketcan.py. :-)
Hey, sorry for the long silence.
I tried making a PR for this, but there's just too much inconsistency between interfaces and fixing that would be a major breaking change. So firstly, I'm not sure if such changes would be welcome and secondly, I don't feel qualified enough to drive the API changes of the whole library.
As for the issue, I'm not sure if it's better to keep this open, in case someone takes up the job, or just close it.
no need to open "receive_own_messages ", change bus.send ,haha...
old_send = bus.send
def new_send(_self, msg: Message, timeout: Optional[float] = None) -> None:
msg.timestamp = datetime.datetime.now().timestamp()
old_send(msg=msg, timeout=timeout)
self.logger(msg)
bus.send = types.MethodType(new_send, bus)