anaconda icon indicating copy to clipboard operation
anaconda copied to clipboard

network:fix segmentfault within NMClient in multithreading(#1931389)

Open t-feng opened this issue 2 years ago • 15 comments

Anaconda got segmentfault during PXE-ks install, and this is a low probability problem. With my own test, it will show up about 1/100 PXE-ks install tests. There is a draft PR(https://github.com/rhinstaller/anaconda/pull/3207), but this can not solve probelm. According to Thomas's suggestions(https://bugzilla.redhat.com/show_bug.cgi?id=1931389#c11), and based on rvykydal's PR, I did a little changes and fix this problem:

  1. anaconda started multithreading task for network module, and we should create new nmclient for every thread. (also, we can only use nmclient in single thread,but it is difficult for anaconda with multithreading mechanism )
  2. create new main-context for nmclient and push it thread default.
  3. iterating the main-context for all of the async actions.

fix: #1931389

t-feng avatar Jun 25 '22 02:06 t-feng

Hello @t-feng! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 28:1: E402 module level import not at top of file Line 37:1: E402 module level import not at top of file

Comment last updated at 2022-07-09 08:24:16 UTC

pep8speaks avatar Jun 25 '22 02:06 pep8speaks

I have tested in anaconda-33.19 with 1000 times, everything is ok. I think this problem still exist in anaconda-mainline. @rvykydal hello, it will be greate for me if you could review this PR. because this PR is based on your work. :)

t-feng avatar Jun 25 '22 02:06 t-feng

/kickstart-test --testtype smoke

t-feng avatar Jun 27 '22 06:06 t-feng

/kickstart-test --testtype smoke

rvykydal avatar Jun 27 '22 12:06 rvykydal

@t-feng thank you for the patch, it is greatly appreciated! We will review it as soon as possible. Also I am going to run our kickstart tests on it extensively to try to confirm the issue is gone.

rvykydal avatar Jun 28 '22 06:06 rvykydal

Thank you! This part of unit tests probably needs fixing, although that's just cosmetics:

W0611(unused-import):/tmp/anaconda/pyanaconda/modules/network/network.py:26,0: : Unused SystemBus imported from pyanaconda.core.dbus

On the other hand, pylint is now disabled because of Python 3.11 beta, so maybe it's not critical to fix here, we will surely accumulate more until pylint can be turned on again.

Your call :)

VladimirSlavik avatar Jun 28 '22 12:06 VladimirSlavik

OK,I will fix this :)

t-feng avatar Jun 29 '22 07:06 t-feng

The PR looks good to me.

Based on suggestions by Thomas Haller in a recent issue: https://bugzilla.redhat.com/show_bug.cgi?id=2100883#c11 I'd also like to get rid of the sync Queue completely (its purpose was synchronization which would be handled by Glib loop now) A draft of it coud be: https://github.com/rhinstaller/anaconda/compare/master...rvykydal:anaconda:t-feng-no-queue but it is missing re-implementation of timeouts, probably something along this: https://github.com/NetworkManager/NetworkManager/blob/af447c493c67df2e4a81b60d4819fea9cd6c6853/examples/python/gi/ovs-external-ids.py#L84 )

I am leaving for two weeks of PTO so I'd continue to work on it when I am back. In the meantime my team colleagues may review or handle your PR. Also I'll ask @thom311 for review.

rvykydal avatar Jun 30 '22 10:06 rvykydal

/kickstart-test --testtype smoke

VladimirSlavik avatar Jun 30 '22 11:06 VladimirSlavik

FYI. I added an example script here in the hope it could be useful.

thom311 avatar Jul 05 '22 16:07 thom311

FYI. I added an example script here in the hope it could be useful.

It's greate for me, I will update the PR. thx :)

t-feng avatar Jul 06 '22 09:07 t-feng

For the record, I tested the original PR applied on rhel-9 branch - run 4422 tests with patched and un-patched anaconda. Without the patch there were 6 cases of segfault, with the patch the segfault didn't occur at all.

rvykydal avatar Jul 22 '22 10:07 rvykydal

/kickstart-test --testtype smoke

rvykydal avatar Jul 26 '22 08:07 rvykydal

There are some issues caught by pylint in our validation tests (https://github.com/rhinstaller/anaconda/runs/7502614984?check_suite_focus=true)

(It should be possible to run the tests locally as described here: https://github.com/rhinstaller/anaconda/tree/master/tests#testing-anaconda)

************* Module pyanaconda.modules.network.nm_client
W0703(broad-except):/tmp/anaconda/pyanaconda/modules/network/nm_client.py:590,15: add_connection_sync.finish_callback: Catching too general exception Exception
W0703(broad-except):/tmp/anaconda/pyanaconda/modules/network/nm_client.py:1068,15: commit_changes_with_autoconnection_blocked.finish_callback: Catching too general exception Exception
W0703(broad-except):/tmp/anaconda/pyanaconda/modules/network/nm_client.py:1110,15: activate_connection_sync.finish_callback: Catching too general exception Exception
W0703(broad-except):/tmp/anaconda/pyanaconda/modules/network/nm_client.py:1153,15: clone_connection_sync.finish_callback: Catching too general exception Exception

Maybe we should catch GLib.GError here? (If we are not sure about the Error we should catch here, we are disabling the check at specific places by # pylint: disable=broad-except comment in the code.)

W0611(unused-import):/tmp/anaconda/pyanaconda/modules/network/nm_client.py:26,0: : Unused Queue imported from queue
W0611(unused-import):/tmp/anaconda/pyanaconda/modules/network/nm_client.py:26,0: : Unused Empty imported from queue

The import line needs to be removed from nm_client.py

************* Module pyanaconda.core.glib
E0603(undefined-all-variable):/tmp/anaconda/pyanaconda/core/glib.py:38,53: : Undefined variable name 'mainloop_runmarkup_escape_text' in __all__
W0611(unused-import):/tmp/anaconda/pyanaconda/core/glib.py:28,0: : Unused markup_escape_text imported from gi.repository.GLib

This is already remarked in one of my comments (https://github.com/rhinstaller/anaconda/pull/4212#discussion_r928982400 - missing ",")

rvykydal avatar Jul 26 '22 08:07 rvykydal

Other than the remarks above the patch looks good to me and I think it is ready for merging. Thank you a lot for picking this issue up and fixing it. I'll perhaps run another battery of kickstart tests on the final version. I'd also ask @thom311 for final review. Also we can fix some issues (ideas for improvement) as a follow-up pull requests.

rvykydal avatar Jul 26 '22 09:07 rvykydal

@t-feng hello, do you think you will be able to work on the PR in close future? I would like to finish it before forgetting what it was all about / losing the context again. If you prefer I can take it over and open another PR based on yours with the remarks addressed.

rvykydal avatar Aug 22 '22 11:08 rvykydal

@t-feng hello, do you think you will be able to work on the PR in close future? I would like to finish it before forgetting what it was all about / losing the context again. If you prefer I can take it over and open another PR based on yours with the remarks addressed.

Sorry about the delay. I was busy with another work and forgot to update the PR. @rvykydal hello, you can take it over, thks.

t-feng avatar Aug 23 '22 07:08 t-feng

@t-feng hello, do you think you will be able to work on the PR in close future? I would like to finish it before forgetting what it was all about / losing the context again. If you prefer I can take it over and open another PR based on yours with the remarks addressed.

Sorry about the delay. I was busy with another work and forgot to update the PR. @rvykydal hello, you can take it over, thks.

Finishing in https://github.com/rhinstaller/anaconda/pull/4322

rvykydal avatar Sep 08 '22 12:09 rvykydal