packages icon indicating copy to clipboard operation
packages copied to clipboard

bandwidthd: parially broken in 22.03 and master

Open padre-lacroix opened this issue 2 years ago • 7 comments

Maintainer: @padre-lacroix Environment: arch MediaTek MT7621 ver:1 eco:3 model Ubiquiti EdgeRouter X OpenWrt version OpenWrt 22.03.0 r19685-512e76967f / LuCI openwrt-22.03 branch git-22.245.77528-487e58a

Also on an Archer C7 v2, same OpenWrt version (OpenWrt 22.03.0 r19685-512e76967f / LuCI openwrt-22.03 branch git-22.245.77528-487e58a.

Description: from issue #19352 (opened by https://github.com/frankgrant) that is now closed: Getting kernel error messages: do_page_fault(): sending SIGSEGV to bandwidthd for invalid read access from 00000018 epc = 77e76534 in libc.so[77dea000+a9000] ra = 77e76410 in libc.so[77dea000+a9000]

No graphs are being produced. It worked fine with OpenWrt 22.02

I (@padre-lacroix ) have the same error on a Archer C7 v2.

A pull request has been made by @neheb (https://github.com/openwrt/packages/pull/19392) but did not fix the issue!

https://github.com/frankgrant reported on 2022-09-29 that the problem still exists and this was confirmed by @padre-lacroix on the Archer C7 v2.

I looked that the changes between 21.02 and 22.03, and maybe musl 1.2x with the time_t type from 32 to 64 bit is in cause, but I am really not sure: I need help here! The error happens whether bandwidthd or bandwidthd-sqlite is used. bandwidthd-pgsql has not been tested, but it should behave the same way as bandwidthd-sqlite.

It happens when trying to make the graph at 192.168.1.1/bandwidthd: every time you ask the web page to make the graph the error occurs. Maybe related to graph.c (I have not thoroughly checked).

Investigate if using bandwidthd-sqlite would make graph from the data in sqlite and PHP and it works fine. But, installing bandwidthd-sqlite was a little bit problematic has a conflict between libgd and libgd-ful prevented its installation. Has to install libgd-full first and then was able to install bandwidthd-sqlite and thsi is working properly (with the exception of when asking 192.168.1./bandwidthd to make the graph: the error occurs. But the PHP graph at 192.168.1.1/phphtdocs are working properly.

The conflict between libgd and libgd-full has been reported (see https://forum.openwrt.org/t/build-system-libgd-vs-libgd-full-anomaly/138199). See also #19372 .

I am a little bit at loss here and I am not sure what to do: 1- the SIGSEGv issue is very hard to investigate and I do not know where to start 2- For libgd vs ligd-full: should I put libgd-full in the makefile of bandwidthd-sqlite, or is it ok to leave it at libgd and maybe libgd vs libgd-full will be resolved in the emar future and this will not be a problem anymore.

I need help to fix this!

Format code blocks by wrapping them with pairs of ```

padre-lacroix avatar Oct 02 '22 23:10 padre-lacroix

you should probably build with gdb and debug symbols so you can backtrace the crash.

edit: musl 1.2x has a new allocator as well as some pthread changes. 64-bit time_t as well.

I will note there's a more up to date fork of bandwidthd available: https://codeberg.org/post-factum/bandwidthd

Unfortunately a bunch of stuff was removed.

neheb avatar Oct 02 '22 23:10 neheb

Maintainer: @padre-lacroix Environment: arch MediaTek MT7621 ver:1 eco:3 model Ubiquiti EdgeRouter X OpenWrt version OpenWrt 22.03.0 r19685-512e76967f / LuCI openwrt-22.03 branch git-22.245.77528-487e58a

Also on an Archer C7 v2, same OpenWrt version (OpenWrt 22.03.0 r19685-512e76967f / LuCI openwrt-22.03 branch git-22.245.77528-487e58a.

Description: from issue #19352 (opened by https://github.com/frankgrant) that is now closed: Getting kernel error messages: do_page_fault(): sending SIGSEGV to bandwidthd for invalid read access from 00000018 epc = 77e76534 in libc.so[77dea000+a9000] ra = 77e76410 in libc.so[77dea000+a9000]

No graphs are being produced. It worked fine with OpenWrt 22.02

I (@padre-lacroix ) have the same error on a Archer C7 v2.

A pull request has been made by @neheb (#19392) but did not fix the issue!

https://github.com/frankgrant reported on 2022-09-29 that the problem still exists and this was confirmed by @padre-lacroix on the Archer C7 v2.

I looked that the changes between 21.02 and 22.03, and maybe musl 1.2x with the time_t type from 32 to 64 bit is in cause, but I am really not sure: I need help here! The error happens whether bandwidthd or bandwidthd-sqlite is used. bandwidthd-pgsql has not been tested, but it should behave the same way as bandwidthd-sqlite.

It happens when trying to make the graph at 192.168.1.1/bandwidthd: every time you ask the web page to make the graph the error occurs. Maybe related to graph.c (I have not thoroughly checked).

Investigate if using bandwidthd-sqlite would make graph from the data in sqlite and PHP and it works fine. But, installing bandwidthd-sqlite was a little bit problematic has a conflict between libgd and libgd-ful prevented its installation. Has to install libgd-full first and then was able to install bandwidthd-sqlite and thsi is working properly (with the exception of when asking 192.168.1./bandwidthd to make the graph: the error occurs. But the PHP graph at 192.168.1.1/phphtdocs are working properly.

The conflict between libgd and libgd-full has been reported (see https://forum.openwrt.org/t/build-system-libgd-vs-libgd-full-anomaly/138199). See also #19372 .

I am a little bit at loss here and I am not sure what to do: 1- the SIGSEGv issue is very hard to investigate and I do not know where to start 2- For libgd vs ligd-full: should I put libgd-full in the makefile of bandwidthd-sqlite, or is it ok to leave it at libgd and maybe libgd vs libgd-full will be resolved in the emar future and this will not be a problem anymore.

I need help to fix this!

Format code blocks by wrapping them with pairs of ```

ruralroots avatar Oct 06 '22 13:10 ruralroots

The conflict between libgd and libgd-full has been reported (see https://forum.openwrt.org/t/build-system-libgd-vs-libgd-full-anomaly/138199). See also https://github.com/openwrt/packages/pull/19372 .

19372 has now been merged. FWIW, I just patched the code from the pull and all other packages that used libgd built properly when specifying libgd-full

ruralroots avatar Oct 06 '22 13:10 ruralroots

@ruralroots . Thanks for the correction. I am currently trying to build bandwidthd in 22.03 with debugging, but I am hitting a compiling problem with the package kernel/gpio-button-hotplug for which the compiler says that there are no rules to build it.

@neheb I will look at the bandwidthd in post-factum, but the source of bandwidthd that is in openwrt is quite different from other sources: there are many that are available as many people forked the original bandwidthd that has not been updated inthe last 10 years. I looked at many of them 5-6 years ago and chose the current one.

padre-lacroix avatar Oct 06 '22 13:10 padre-lacroix

Just a small update on this.

Yes the problem between libgd and libgd-full has been resolved. Thanks @ruralroots .

Now for the SIGSEGV problem, I and using GDB and narrowed down where the problem is: when executing the GraphIP at line 146 of bandwidthd.c. I will be continuing the investigation, but I do not know when I will be done and when I will have pull request for this. It may still take a couple of weeks, as I can only work on that in weekends.

padre-lacroix avatar Oct 09 '22 20:10 padre-lacroix

Good news: I found the problem and was able to fix it. I now have a router (Archer C7 v2) with 22.03 that create graphs with the bandwidthd package. I will have to check that it also works on the -pgsql package and the -sqlite package: it should as this is the same code but I would rather check that it is working.

It was indeed related to the change in time_t (32 bits to 64 bits on a 32 bit system) with musl.

I will now clean a little bit the patch (I had to add a few extra variables for debugging purpose: I need to remove them) and create a pull request in both master and the 22.03 branches.

I have not done a pull request in a couple of years, so I am rusted on that side. I will try to get that over the weekend.

padre-lacroix avatar Oct 18 '22 23:10 padre-lacroix

It was indeed related to the change in time_t (32 bits to 64 bits on a 32 bit system) with musl.

No surprise there.

neheb avatar Oct 19 '22 00:10 neheb

Now works fine on Ubiquiti EdgeRouter X. Thanks people for your great work.

frankgrant avatar Oct 30 '22 05:10 frankgrant

You are quite welcome! Happy that it works and that this is useful.

padre-lacroix avatar Oct 30 '22 12:10 padre-lacroix

Looks like a fix was pushed.

neheb avatar Jun 09 '24 00:06 neheb