svxlink
svxlink copied to clipboard
Problem with new version
Hi Tobias, there is a new problem occurs with a newer version of svxlink. I can not describe exactly with which version it's raised for the first time, I guess after the last release. The problem occurs after the local squelch has been closed and the tcl event has been executed. The cpu load is increasing up to 90%, local audio isn't transferred from rx to tx anymore and the ptt hangs. In the logfile no unusual output can be found. SvxLink must be killed by kill -9 in this case. Strg-C in the command line has no effect. The issue was confirmed by Frank/DL7ATA too.
73s de Adi / DL1HRC
In addition to Adi, I try to describe it by myself: Since I've compiled v1.6.99.3 (I use "89b60d0 commit b171ffadc04bde2163c618f2ec73ff45c67d02b9") on 4 different BanaPi distros linked with svxreflector, on 2 busy machines we have big problems especially on incoming echolink connections. One machine has only one simplexlogic, the other (mine) run with two simlexlogics. What happened: The svxlink process suddenly stuck and "top" shows, that it has 100% CPU load, the log stops and didn't show any further entries. It is reproducible any time on mine machine by activating the echolink module or by responding on an incoming echolink connection on the logic where echolink is configured. When I respond to an incoming echolink connection on the second logic (no EL is configured), I can talk a few minutes before the system will freeze. Hope it helps :) 73 de Frank/DL7ATA
The problem raised up between 380e533 and 3aaa17f Will continue...
@sm0svx : it's your commit 0711a85 380e533 is working
I confirm. Just switch all 4 stations back to 380e533 .
Ouch! Thanks for pinpointing the commit that the buggy code was introduced in. I'll have to go through that darn AudioProcessor once again. It's surprisingly hard to get it right.
It's strange that I have not seen a trace of this problem on my own nodes. It would help a lot if you could give me a minimal config that still have this problem.
I'm going to do the fix on the maint branch so if you could use that when deriving the minimal config it would be great.
There was no special configuration, just a RepeaterLogic or (on DL7ATA's nodes) a SimplexLogic. The error could be confirmed on all types of squelch detectors and with a PTT_TYPE=NONE as well. If you want I can give you access to a node and you can make tests with the PTY-squelch.
I can provide you with my working "minimal" configuration - it contains 176 lines. Is it helpful? I am ready to try the maint branch, let me know when its time to.
The purpose of a minimal config is to eliminate as much as possible of functionality to pinpoint where the problem is. Does it still occur if modules are removed? SvxReflector removed? Logic linking removed? Other features removed? Any config reproducing the problem helps of course but the larger the config is the harder it is to pinpoint the problem.
Even if the specific commit that you provided point to the AudioProcessor the problem may be somewhere else and that the change of the AudioProcessor trigger another bug. Then it is of special importance to trim down the config to zoom in on where in the code the problem is.
Adi, I'd prefer to create a local setup that reproduce the problem but if I can't do that I'll try to remote debug it on your system.
Hm, it seems to be a hardware- or platform-specific (or compiler?) issue. It occurs on this system: 4.4.0-112-generic #135-Ubuntu SMP Fri Jan 19 11:48:14 UTC 2018 i686 i686 i686 GNU/Linux gcc (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609
On my system everything is ok (3.16.7-53-default #1 SMP Fri Dec 2 13:19:28 UTC 2016 (7b4a1f9) x86_64 x86_64 x86_64 GNU/Linux), gcc (SUSE Linux) 4.8.3 20140627 [gcc-4_8-branch revision 212064].
Here the config of the problem node, I simulate the sql open/close with echo "O">/tmp/sql SvxLink hangs when the SQL has been closed.
[GLOBAL] #MODULE_PATH=/usr/lib/i386-linux-gnu/svxlink LOGICS=RepeaterLogic CFG_DIR=svxlink.d TIMESTAMP_FORMAT="%c" CARD_SAMPLE_RATE=48000 CARD_CHANNELS=1
RepeaterLogic] TYPE=Repeater RX=Rx1 TX=Tx1 #MODULES=ModuleHelp,ModuleParrot,ModuleMetarInfo,ModuleEchoLink CALLSIGN=DB0ERZ SHORT_IDENT_INTERVAL=30 LONG_IDENT_INTERVAL=60 EVENT_HANDLER=/usr/share/svxlink/events.tcl DEFAULT_LANG=de_DE RGR_SOUND_DELAY=50 REPORT_CTCSS=71.9 TX_CTCSS=SQL_OPEN,ANNOUNCEMENT MACROS=Macros FX_GAIN_NORMAL=-8 FX_GAIN_LOW=-12 IDLE_TIMEOUT=5 #OPEN_ON_1750=1000 OPEN_ON_CTCSS=71.9:650 OPEN_ON_DTMF=* OPEN_ON_SQL=500 #OPEN_ON_SEL5=08151 OPEN_SQL_FLANK=OPEN ACTIVATE_MODULE_ON_LONG_CMD=4:EchoLink ONLINE_CMD=501871 #STATE_PTY=/tmp/state DTMF_CTRL_PTY=/tmp/repeater_dtmf_ctrl
[Rx1] TYPE=Local AUDIO_DEV=alsa:plughw:1 AUDIO_CHANNEL=0 SQL_DET=PTY SQL_START_DELAY=20 SQL_DELAY=10 SQL_HANGTIME=150 VOX_FILTER_DEPTH=20 VOX_THRESH=1000 #CTCSS_MODE=2 CTCSS_FQ=71.9 SERIAL_PORT=/dev/ttyUSB1 SERIAL_PIN=CTS PTY_PATH=/tmp/sql SIGLEV_SLOPE=1 SIGLEV_OFFSET=0 SIGLEV_OPEN_THRESH=30 SIGLEV_CLOSE_THRESH=10 DEEMPHASIS=1 SQL_TAIL_ELIM=190 DTMF_DEC_TYPE=INTERNAL DTMF_MUTING=1 DTMF_HANGTIME=40 DTMF_SERIAL=/dev/ttyS0
[Tx1] TYPE=Local AUDIO_DEV=alsa:plughw:1 AUDIO_CHANNEL=0 ###PTT_TYPE=SerialPin PTT_TYPE=NONE ###PTT_PTY=/tmp/pty PTT_PORT=/dev/ttyUSB1 PTT_PIN=DTR TIMEOUT=600 CTCSS_FQ=71.9 CTCSS_LEVEL=8 PREEMPHASIS=1 DTMF_TONE_LENGTH=100 DTMF_TONE_SPACING=50 DTMF_DIGIT_PWR=-15
So with that config, your node will get into the bad state just by opening and closing the squelch? Nothing else? I just tried it here on my laptop but it runs fine. I have a simple bash loop writing alternating O/Z to /tmp/sql and the squelch open and closes nicely. I'll try it on my Pi as well.
Do you have any TCL modifications that could cause this behavior? Can the de_DE langpack do anything strange? I'm using the default en_US (heather) voice here.
I really don't have any idea why it happens here. I's the 100th or more node that I have setup in the last years. I've just tried it out: with SvxLink v1.5.99.17 all woks fine, with the last trunk (1.6.99.3) version the problem occurs, with the same language pack and the same tcl environment. Will install the trunk on my laptop and on other platforms, it may take some time. @dl7ata : could you provide your gcc version and platform-specific information please?
Sure: gcc version 4.9.2 (Debian 4.9.2-10) Linux banana26 3.4.113-bananian #9 SMP PREEMPT Sat May 6 12:20:11 UTC 2017 armv7l GNU/Linux
I can reproduce the problem now when testing on the RPi. However, it does not happen with the latest release (maint branch) but it do happen on master. Can you verify that the problem does not exist in the maint branch?
The commit you referred to above exist in both master and maint so that commit cannot be the whole story to why this is happening. I'll see if I can find something else causing the problem.
I tried some basic functions in maint branch on my own station and it seems to work, no freezing so far. I'll let you know if problems will occur on that branch.
I think I found what's wrong. Please recheck that you have the correct TCL files matching the latest master and that you have no TCL modifications, especially Logic::send_rgr_sound. That function changed after the merge of the afsk_com branch and the old implementation seem to cause a fatal problem when used with the new code.
The change is that the send_rgr_sound function expect to be given a receiver ID in the form of a single character while the old implementation of that function expected a receiver order number. The default implementation send the receiver ID using morse code on squelch close unless it's set to '?', which is the default. Set the receiver ID using the new RX_ID RX-section configuration variable.
This is my Logic::send_rgr_sound, last modified in Nov 2017:
`proc send_rgr_sound {} {
variable sql_rx_id;
global logic_name;
if {$logic_name == "RemoteLogic"} {
playTone 330 250 85;
playTone 495 150 35;
} else {
if {$logic_name == "NetLogic"} {
playTone 2450 250 45;
playTone 2495 150 25;
} else {
playTone 320 250 85;
playTone 440 100 35;
for {set i 0} {$i < $sql_rx_id} {incr i 1} {
playTone 2450 500 50;
playSilence 50;
}
}
playSilence 50;
}
} `
I think this should work ... ? I am just switching back to master and playing around with original TCL. I'll report about my experience.
I delete all my .local-stuff above (send_rgr_sound) and replaced it with original TCL. The first test is positive, I can't reproduce the failure anymore. I'll do deeper tests today on master branch.
Seems to work without any error until now. Now I am switching 3 more systems to master, using default "send_rgr_sound"-TCL.
The problem is this TCL-code:
for {set i 0} {$i < $sql_rx_id} {incr i 1} {
playTone 2450 500 50;
playSilence 50;
}
When sql_rx_id is set to ASCII 63 '?', which is the default in the new code, the code above will try to send 63 beeps. That seems to trigger a behavior that throws SvxLink into live-lock. Even though the TCL code is wrong, the best thing would of course be that SvxLink could handle it without locking up.
If you want to keep using a modified version of send_rgr_sound, you should use the new implementation as a base.
If you compile a new Version from Git the new TCL files are not updated sometimes, i had to copy them by myself. @sm0svx maybe as enhancement that the install script checks the Versions of the tcl scripts and if there are newer ones they should replaced.
I never had any trouble with TCL when you are looking strictly to the principle of "local"-path. But inside local, you have to be careful when new code is arising.
i also use the local path for custom scripts... but the main scripts wasnt updated after upgrade of svxlink
Long issue... to me it looks like it was resolved? @dl1hrc Adi, can we close it?