anaconda
anaconda copied to clipboard
network:fix segmentfault within NMClient in multithreading(#1931389)
Anaconda got segmentfault during PXE-ks install, and this is a low probability problem. With my own test, it will show up about 1/100 PXE-ks install tests. There is a draft PR(https://github.com/rhinstaller/anaconda/pull/3207), but this can not solve probelm. According to Thomas's suggestions(https://bugzilla.redhat.com/show_bug.cgi?id=1931389#c11), and based on rvykydal's PR, I did a little changes and fix this problem:
- anaconda started multithreading task for network module, and we should create new nmclient for every thread. (also, we can only use nmclient in single thread,but it is difficult for anaconda with multithreading mechanism )
- create new main-context for nmclient and push it thread default.
- iterating the main-context for all of the async actions.
fix: #1931389
Hello @t-feng! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
- In the file
pyanaconda/modules/network/nm_client.py
:
Line 28:1: E402 module level import not at top of file Line 37:1: E402 module level import not at top of file
Comment last updated at 2022-07-09 08:24:16 UTC
I have tested in anaconda-33.19 with 1000 times, everything is ok. I think this problem still exist in anaconda-mainline. @rvykydal hello, it will be greate for me if you could review this PR. because this PR is based on your work. :)
/kickstart-test --testtype smoke
/kickstart-test --testtype smoke
@t-feng thank you for the patch, it is greatly appreciated! We will review it as soon as possible. Also I am going to run our kickstart tests on it extensively to try to confirm the issue is gone.
Thank you! This part of unit tests probably needs fixing, although that's just cosmetics:
W0611(unused-import):/tmp/anaconda/pyanaconda/modules/network/network.py:26,0: : Unused SystemBus imported from pyanaconda.core.dbus
On the other hand, pylint is now disabled because of Python 3.11 beta, so maybe it's not critical to fix here, we will surely accumulate more until pylint can be turned on again.
Your call :)
OK,I will fix this :)
The PR looks good to me.
Based on suggestions by Thomas Haller in a recent issue: https://bugzilla.redhat.com/show_bug.cgi?id=2100883#c11 I'd also like to get rid of the sync Queue completely (its purpose was synchronization which would be handled by Glib loop now) A draft of it coud be: https://github.com/rhinstaller/anaconda/compare/master...rvykydal:anaconda:t-feng-no-queue but it is missing re-implementation of timeouts, probably something along this: https://github.com/NetworkManager/NetworkManager/blob/af447c493c67df2e4a81b60d4819fea9cd6c6853/examples/python/gi/ovs-external-ids.py#L84 )
I am leaving for two weeks of PTO so I'd continue to work on it when I am back. In the meantime my team colleagues may review or handle your PR. Also I'll ask @thom311 for review.
/kickstart-test --testtype smoke
FYI. I added an example script here in the hope it could be useful.
FYI. I added an example script here in the hope it could be useful.
It's greate for me, I will update the PR. thx :)
For the record, I tested the original PR applied on rhel-9 branch - run 4422 tests with patched and un-patched anaconda. Without the patch there were 6 cases of segfault, with the patch the segfault didn't occur at all.
/kickstart-test --testtype smoke
There are some issues caught by pylint in our validation tests (https://github.com/rhinstaller/anaconda/runs/7502614984?check_suite_focus=true)
(It should be possible to run the tests locally as described here: https://github.com/rhinstaller/anaconda/tree/master/tests#testing-anaconda)
************* Module pyanaconda.modules.network.nm_client
W0703(broad-except):/tmp/anaconda/pyanaconda/modules/network/nm_client.py:590,15: add_connection_sync.finish_callback: Catching too general exception Exception
W0703(broad-except):/tmp/anaconda/pyanaconda/modules/network/nm_client.py:1068,15: commit_changes_with_autoconnection_blocked.finish_callback: Catching too general exception Exception
W0703(broad-except):/tmp/anaconda/pyanaconda/modules/network/nm_client.py:1110,15: activate_connection_sync.finish_callback: Catching too general exception Exception
W0703(broad-except):/tmp/anaconda/pyanaconda/modules/network/nm_client.py:1153,15: clone_connection_sync.finish_callback: Catching too general exception Exception
Maybe we should catch GLib.GError
here?
(If we are not sure about the Error we should catch here, we are disabling the check at specific places by # pylint: disable=broad-except
comment in the code.)
W0611(unused-import):/tmp/anaconda/pyanaconda/modules/network/nm_client.py:26,0: : Unused Queue imported from queue
W0611(unused-import):/tmp/anaconda/pyanaconda/modules/network/nm_client.py:26,0: : Unused Empty imported from queue
The import line needs to be removed from nm_client.py
************* Module pyanaconda.core.glib
E0603(undefined-all-variable):/tmp/anaconda/pyanaconda/core/glib.py:38,53: : Undefined variable name 'mainloop_runmarkup_escape_text' in __all__
W0611(unused-import):/tmp/anaconda/pyanaconda/core/glib.py:28,0: : Unused markup_escape_text imported from gi.repository.GLib
This is already remarked in one of my comments (https://github.com/rhinstaller/anaconda/pull/4212#discussion_r928982400 - missing ",")
Other than the remarks above the patch looks good to me and I think it is ready for merging. Thank you a lot for picking this issue up and fixing it. I'll perhaps run another battery of kickstart tests on the final version. I'd also ask @thom311 for final review. Also we can fix some issues (ideas for improvement) as a follow-up pull requests.
@t-feng hello, do you think you will be able to work on the PR in close future? I would like to finish it before forgetting what it was all about / losing the context again. If you prefer I can take it over and open another PR based on yours with the remarks addressed.
@t-feng hello, do you think you will be able to work on the PR in close future? I would like to finish it before forgetting what it was all about / losing the context again. If you prefer I can take it over and open another PR based on yours with the remarks addressed.
Sorry about the delay. I was busy with another work and forgot to update the PR. @rvykydal hello, you can take it over, thks.
@t-feng hello, do you think you will be able to work on the PR in close future? I would like to finish it before forgetting what it was all about / losing the context again. If you prefer I can take it over and open another PR based on yours with the remarks addressed.
Sorry about the delay. I was busy with another work and forgot to update the PR. @rvykydal hello, you can take it over, thks.
Finishing in https://github.com/rhinstaller/anaconda/pull/4322