python-can icon indicating copy to clipboard operation
python-can copied to clipboard

Segmentation fault when calling ics.close_device()

Open robingergg opened this issue 1 year ago • 5 comments

Describe the bug

Calling this function of can.interfaces.ics_neovi.neovi_bus causes Segmentation Fault: def shutdown(self): super().shutdown() ics.close_device(self.dev)

To Reproduce

Having a neoVI Fire 2 device transmitting CAN signals I have the next TC: `*** Settings *** Resource ${CURDIR}/../ResourceFile.resource

Test Setup Set CAN ${INTERFACE} ${CHANNEL} ${BITRATE} ${DB_FILE} ${TEST_NAME} ${recv_own_msg} Test Teardown Shut Down Bus

*** Test Cases *** Test Reciveing Message [Tags] Test-1 Run Keyword And Continue On Failure Step 1

*** Keywords *** Step 1 Check The Reception Of ${MESSAGE_NAME} TimeOut ${TIMEOUT} Seconds Sending To ${TARGET_DEVICE} ` This is a simple test case which uses this robot-framework library: https://github.com/Openwide-Ingenierie/robotframework-can-uds-library

The problem occurs when the code reaches the Test Teardown. Inside it this method will be called from the above mentioned robot-framework lib.: def stop_bus(self): """Stop the CAN BUS""" self.bus.shutdown()

At the end it produces Segmentation Fault.

Expected behavior

The expected behavior is to shut down the previously opened bus without a problem.

Additional context

OS and version: Windows, 10 Python version: Python 3.10.12 python-can version: python-can==4.2.2 python-can interface/s (if applicable): Interfaces is involved

When using XCP - pyxcp, the Segmentation Fault can be avoided by sleeping 5 seconds before the attempt of closing the device. However with the provided test case above, it occurs no matter what.

Traceback and logs
def func():
    return "hello, world!"

robingergg avatar Feb 12 '24 16:02 robingergg

Hi @8Klaro8, can you please share the version of python-ics that you are using and run your code with the PYTHONFAULTHANDLER environment variable set (https://docs.python.org/3/library/faulthandler.html) and share the stack trace?

pierreluctg avatar Feb 12 '24 18:02 pierreluctg

Hi @pierreluctg ! Of course. Below are the requested infos: NOTE: I ran 3 times and thus there are 3 logs of faulthandler. They are separated with 1.) run, 2.) run etc..

python-ics version:

  • python-ics==912.4.post1

faulthandlers log

Run 1

Windows fatal exception: access violation

Current thread 0x00002d80 (most recent call first):
  File \MYAPTH\Python\Python311\site-packages\can\interfaces\ics_neovi\neovi_bus.py", line 321 in _process_msg_queue
  File \MYAPTH\Python\Python311\site-packages\can\interfaces\ics_neovi\neovi_bus.py", line 408 in _recv_internal
  File \MYAPTH\Python\Python311\site-packages\can\bus.py", line 121 in recv
  File \MYAPTH\Python\Python311\site-packages\can\notifier.py", line 124 in _rx_thread
Windows fatal exception:   File access violation

Run 2

Windows fatal exception: access violationWindows fatal exception: access violation

Thread 0x00003718 (most recent call first):
  File \MYAPTH\Python\Python311\site-packages\can\interfaces\ics_neovi\neovi_bus.py", line 321 in _process_msg_queue
  File \MYAPTH\Python\Python311\site-packages\can\interfaces\ics_neovi\neovi_bus.py"

Windows fatal exception: access violation

Run 3

Windows fatal exception: access violationCurrent thread 0x00002f98

 (most recent call first):
  File \MYAPTH\Python\Python311\site-packages\can\interfaces\ics_neovi\neovi_bus.py", line 321 in _process_msg_queue
  File \MYAPTH\Python\Python311\site-packages\can\interfaces\ics_neovi\neovi_bus.py", line 408 in _recv_internal
  File "

`

robingergg avatar Feb 13 '24 11:02 robingergg

@pierreluctg my conclusion so far is that the device(Neovi in this case) has been already closed by the time can.interface.ics_neovi.neovi_bus wants to close it using its shutdown() and thus it gives the Segmentation fault.

Uncommenting the "ics.close_device()" in neovi_bus.py avoids the Segmentation fault.

Reading the documentation of ics says: ". Also since python is a object oriented language the module utilizes this and auto cleans up device handles when going out of scope so there is usually no need to call ics.close_device()."

This may explain, why it is causing Segfault.

robingergg avatar Feb 15 '24 09:02 robingergg

@8Klaro8 looks like you may have a variable scoping issue in your code.

Are you able to share some simple (few line) code that can reproduce the issue?

pierreluctg avatar Feb 16 '24 18:02 pierreluctg

@pierreluctg with the provided library above: https://github.com/Openwide-Ingenierie/robotframework-can-uds-library

And the provided robot code produces this error. Just by simply setting up a robot test where we check a certain CAN signal thru neoVI device. The configuration looks as follows when setting up the CAN bus in this line: Set CAN ${INTERFACE} ${CHANNEL} ${BITRATE} ${DB_FILE} ${TEST_NAME} ${recv_own_msg} Config:: interface: neovi channel: HSCAN2 bitrate: 500000 db_file: test.dbc test_name: mytest recv_own_msg: False

robingergg avatar Feb 19 '24 09:02 robingergg

Hi @8Klaro8, I still suspect that you have a variable (the can bus object) scope issue in your Robot Framework lib. Unfortunately, debugging your Robot Framework library is out of the scope of this project. If you are able to write a simple Python script that expose the issue(s), I will be happy to have a look.

pierreluctg avatar Mar 28 '24 15:03 pierreluctg

Hi @pierreluctg, I did not confirm it, but I assume the problem cause by the neovi bus implementation by python-can. Based on the docs of the neovi bus, the neovi device shall be closed "automatically" yet the implementatino of neovi bus closes the bus when shutting down the can bus, therefore it gave segmentation fault.

robingergg avatar Apr 15 '24 07:04 robingergg

@8Klaro8, yes the neovi device handle get automatically closed when getting out of scope. This do not mean that closing the device handle ( via ics.close_device) is not allowed and will cause a segmentation fault.

pierreluctg avatar Apr 15 '24 18:04 pierreluctg

@pierreluctg I see. Interesting.

I monkeypatched the shutdown method of neovi_bus.py and the segfault dissapeared (I am skipping the calling of ics.close_device()) and the bus closes without a problen (automatically).

robingergg avatar Apr 15 '24 18:04 robingergg

The only thing this is demonstrating is if your code avoids calling the bus shutdown method, you do not get in that issue...

pierreluctg avatar Apr 15 '24 18:04 pierreluctg

This can reproduce what you are seeing...

import time

import can

with can.interface.Bus(interface="neovi", channel=1) as bus:
    notifier = can.Notifier(bus, [can.Printer()])
# The bus is now closed
time.sleep(10)

pierreluctg avatar Apr 15 '24 18:04 pierreluctg

You can also try with, for example, the virtual interface. You still get an issue with your code I suspect.

Instead of a segmentation fault you will get can.exceptions.CanOperationError: Cannot operate on a closed bus.

I agree that getting an exception is better than a segmentation fault, we can fix that in the neovi interface. However your issue remain the same.

pierreluctg avatar Apr 15 '24 19:04 pierreluctg

@8Klaro8 maybe your code is not closing the notifier (notifier.close()) prior to closing the bus?

pierreluctg avatar Apr 16 '24 13:04 pierreluctg

Added fix in #1765 to avoid the segmentation fault

pierreluctg avatar Apr 16 '24 15:04 pierreluctg

Sorry to be late to the party.

I can see it making sense to raise an error if someone tries to send a message and the CAN bus has been closed. But if it's just processing of an incoming message from off the bus, then maybe an exception need not be raised.

One of the files wrapped with the new wrapper function is _process_msg_queue, which looks like it has underlying design principle of politely returning None, False if the processing is unsuccessful. Perhaps we could follow suit, and politely return None, False if self._is_shutdown is True?

One thing that I wonder about is that by introducing a new exception to an internal function, we now have (as a side effect) introduced the possibility of that exception to all of the consumers of that internal function. This could potentially have downstream effect upon the documentation for all of those consumers?

Anyway, just a thought...

grant-allan-ctct avatar Apr 17 '24 14:04 grant-allan-ctct