icecast-kh
icecast-kh copied to clipboard
KH20 listener_remove not always happend
I am using <authentification type="url">
and <option name="listener_add" ...>
as well ``<option name="listener_remove" ...>to save the access of listeners and the duration. I do it in two steps, first create a MySQL entry for the client with the duration
nulland fianally store the duration, if the
listener_removecall happens. But now I figured out, that not allways a
listener_remove` call happens. So my programm is still showimg some onnections as still active, but which are allready closed.
These is still working with the 32-bit 2.4.x Version (which I use in parallel) and was working without any problems with a version icecast-2.4.0-kh1x. I cannot remember which version it was, but I know that was the last version, after a long time no new versions was released.
Another thing I don't understood, the client connections sometimes increasing and sometimes there is a big gap. Is these usually?
I have made additional investigations.
It seems, that if the listener opens 2 connections (same ip and agent) , but with 2 different client numbers, icecast send a listener_remove
POST request, if the first connection will be closed, but no POST, if the second connection will be closed.
I also got warnings: WARN auth_url/url_remove_listener auth to server http://www.xyz.eu/Ice.php?mode=remove&mount=/128.mp3 (/128.mp3) failed with "Operation timed out after 5000 milliseconds with 0 bytes received"
There are 2 problems!
- if a 'icecast_remove' will send, icecast exspect an answer! This is not the case with 2.4.4! But I send a php header, but icecast doesn't recognize this header.
- Sometimes icecast is not creating a POST call. A client made 4 connections as the same time. 2 with NSplayer /8.0.0.2801 and 2 with FSL IR2.11. 3 of them got closed immediatly, but one NSPlayer connection was used. At the time, where also the last connection where closed, the icecast error.log send an Info, that one listener less. But no 'icecast_remove' POST. But in the access.log there is a GET message, which indicates, that the connection is finished.
So something is not working in the right way!
It#s more and more confusing for me!
I have switched on the debuggung mode under kh15 and found out:
[2023-05-04 09:14:22] DBUG stats/stats_global_calc update global connections (74314) [2023-05-04 09:14:30] DBUG auth/auth_add_listener adding client #74314 for authentication [2023-05-04 09:14:30] DBUG source/listener_change_worker moving listener 74314 on /128.mp3 from 0000000001336d30 to 0000000001336630 [2023-05-04 09:23:01] DBUG stats/stats_global_calc update global client_connections (74314) [2023-05-04 09:58:27] DBUG source/source_listener_release Listener 74314 leaving /128.mp3 [2023-05-04 09:58:27] DBUG auth/auth_remove_listener client #74314 completed
My understanding of this log messages is, that everything was working fine. Alos icecast has recieved a responce, so that remote_listener
POST could be closed.
But at this time, there was not any POST recieved by the server!!!
It seems to me, that there is something wrong with the HTTP(S) interface.
first thing to identify, are you doing any fallback with this stream?
second if there is a timeout on the request then (ie 5 seconds, 0 bytes received) then subsequent operation may be affected. In the listener remove case though, the client will still drop and be logged.
Is the multiple connection always causing the problem? there is no association between the 2 connections within icecast so one won't just let another through. There is a quirk with range requests which may be a factor here
karl
No there is no fallback.
Here one example for client # 98222:
[2023-05-05 15:19:04] DBUG source/source_listener_release Listener 98222 leaving /128.mp3 [2023-05-05 15:19:04] DBUG auth/queue_auth_client starting auth thread 0 [2023-05-05 15:19:04] DBUG auth/auth_run_thread Authentication thread 54 started for /128.mp3 [2023-05-05 15:19:04] DBUG auth/queue_auth_client auth on /128.mp3 has 1 pending [2023-05-05 15:19:04] DBUG auth/auth_run_thread 1 client(s) pending on /128.mp3 [2023-05-05 15:19:04] DBUG auth/auth_remove_listener client #98222 completed [2023-05-05 15:19:04] DBUG auth/auth_run_thread Authenication thread 54 shutting down
It seems everything fine, but there was no listener_remove
POST recieved.
I save all POST requests, but for 98222 there is no one to found.
If you have any additional questions, please let me know.
No, there are also single connections, which creates problems.
I am presuming that you had a timeout and a url delay was triggered. On the listener add I have a message to indicate a delay, not on the remove, I can add that in. The default is 60 seconds but you can change that with the on_error_wait setting in the options. In such cases there is a loss of data but if you are having problems with 5 seconds then you need to see what the bottleneck is.
karl
I don't believe that there is an bottleneck in the system, because with the icecasr 2.4.4 32-bit version it is working without any problem.
During the analysis of error.log files I find out a problem.
There is often a 5 s timeout, but recieve the remove_listener
and on the binning I always send a header("HTTP/1.1 200 OK");
, but i seems, that it will be not recognized.
So, do I have I to send a different header message?
Another question, if a thread is starting with an add_listener
POST request, is this thread nummer valid until the client connection is closed or is the thread always closing after an add_listener
is finished?
Karl, thanks for your help.
I created a program to analyse the error.logs from Icecast. The resukts quite complex so I created a pdf file withthe result and error descriptions. Analysis IceCastKH.pdf
will have to go through what you have posted to see what there is, but to answer the queries above.
A timeout is actually triggered by libcurl when no content has been received within that period, my message is just a reaction to that. So if you have actually sent (ie the packets) all the response headers and then the response is probably getting lost on the way. The header call does not flush the output. Note that just sending the HTTP line is not enough, only when the blank line is sent and flushed onto the network will it been deemed sent.
The HTTP 200 line is actually just a confirmation of the request receipt not an acceptance like the icecast-auth-user header but for a remove listener request, no acceptance response is required. An immediate response is fine if you were to do post processing.
The auth threads are created as per load, it's not tied to listeners. They are tied to the auth block but they are dynamically started to handle the requests whether add or remove. we really could do with caching the POST on failure case so that a resumption means that the those can be pushed out if there is some issue on the connection to the backend.
Don't forget, there is a bit more work in this build for auth than in the xiph tree, so there can certainly be a problem that has crept in, so needs analysis on the causes and reactions in certain cases.
Do you see the problem stemming from a time-out situation?
karl
@karlheyes To optimize my analyze program for the error.log files, I have to understand the messages a little bit better. If a add_listener happend, a handler number will created. I think that this handler number is the same, until the client is shutting down. Ths is sometimes called handler and sometimes authentication thread, but I think it is the same number, until the client is shutting down. Is this right?
You said, there is no acceptance response is required! But in which case will be appear a request complete message in the error.log?
As I have figured out, that a time-out only happens, when several threads are active at the same time or a lot of other activities. In the other cases, the request complete will be written to the error.log file immediatly! There are 4 cores available for Icecast service. Is Icecast able to use these cores in parallel?
And what could be the reason, that Icecast is not sending an url_remove_Listener ... sending request?
Karl, I have checked several issues, where Icecast is not generating an url_remove_listener sending request. In all the cases I have checked, the add_listener request was happend inside a url_remove sending request with a time-out.
Outside of any coding bugs. there are a few cases where requests are not pushed through, some of which are dependent on the request. The main one being an error response from libcurl (usually timeout in such cases) causing the processing to stop for all requests. I can separate those out so that an add request that fails does not stop a remove url from working. It's probably a good idea as urls can be referring to different servers either via name or load sharing.
There really needs to be no timeout on urls really, a several second timeout is just not going to fly for any of the request types but obviously the add/remove listener is the common one. I've not seen a situation where a timeout occurs because of the icecast side of things is at fault. Usually it has been down to a stall in maybe name lookup or the scripts are not optimal in there lookups or any other activity they do, very often something as simple a share resource like DNS or some other DB gets swamped, maybe by multiple apps.
I'll add the separate timeouts for the next update and check the log messaging parts as some may need expanding on. Add listeners are typically more involved operations, so may have more issues with timeouts but removes we should not punish removes for that. The queuing of posts requests is more involved and only helps in failure cases.
Just for reference, you can increase the range of handlers processing auth requests by setting that in the options. Whether that is a factor here is hard to day as increasing the handlers would increase the backend work. Usually it is a tradeoff between extra work in icecast or the backend scripts. The extra threads will allow for scaling across cores but most of the time they are delayed waiting on IO but may be useful for load sharing.
karl.
Karl,
thank you for your investigations. So I have to wait, for the next update and the advanced log messages, which make it maybe easer to run an analyser for the error.log messages.
You have mentioned several times, that I can change something in the options. But up to now, I was not able to found a documentation for that.
Hans-Georg
Karl, these looks strange to me. Analysis IceCastKH2.pdf
There is also a line
DBUG auth/queue_auth_client max authentication handlers allocated
.
What does it mean?
I have uploaded kh21.3. I've gone through a few of the messages, reports handler and client messages. The key thing though is the separation timing on failure cases with different requests. Previously a failed add url would stop a remove from being tried (failed being a timeout type of thing not a rejection of listener).
The max handlers message is just notifying you that all the allocated threads for authenticating are filled, so you have a queue of requests being processed by "handlers". I have expanded some of the messaging to include the handler id, which is a unique one across the server.
Each handler takes a queued request performs the url and processes the result. Each handler is tied to an auth block and are dynamically started/stopped when work is to be done. If you want a large pool of the auth threads just define the auth option handers to specify the number, it allows up to 100, which should be plenty.
karl
Karl,
thank you very much. But kh23.1 is only a source code. How can I get or generate the Win-64-bit code?
You tare talking about to define the auth option headers. Sorry, but I have no idea, how to do it.
Cheers, Georg
I've now attached a windows build (had to fix a naming thing). For setting the number of handlers to run in parallel, just add
option name="handlers" value="10"
inside the authentication block, like you do with the listener_add. This example allows for up to 10 auth requests to occur at the same time for this auth.
karl
Karl,
great!!!
Karl,
I have installed the new versions. These is new, auth_url/url_remove_listener auth on /128.mp3 disabled
[2023-05-18 21:20:20] DBUG source/source_listener_release Listener 409 leaving /128.mp3
I have also seen time-outs again.
Georg [2023-05-18 21:20:20] DBUG auth/queue_auth_client auth on /128.mp3 has 1 pending [2023-05-18 21:20:20] INFO source/source_read listener count on /128.mp3 now 5 [2023-05-18 21:20:20] DBUG stats/modify_node_event update "/128.mp3" listeners (5) [2023-05-18 21:20:20] DBUG client/worker 00000296c80c3be0 now has 8 clients [2023-05-18 21:20:20] DBUG auth/auth_run_thread Authentication thread 0 started for /128.mp3 [2023-05-18 21:20:20] WARN auth_url/url_remove_listener auth on /128.mp3 disabled (client 409, handler 0) [2023-05-18 21:20:20] DBUG auth/auth_remove_listener client #409 completed
in the above snippet, client 409 was being tried and triggered a disabling for probably 30 seconds which may be from a timeout on the listener remove url request. The timeout is about 6 seconds so you should see 409 around :20:14. That will trigger a disable on the listener remove requests for 30 seconds, at which point they resume, but listener add should still go ahead.
Now why the listener remove url takes so long, I don't know. It could be network outage or maybe whatever that url does has an overall stall of at least 6 seconds. No way to tell what the cause is from icecast point of view. It just knows that it has stalled and it backs off. What we don't do yet is re-queue those to try again, but that can only ease a throughput of requests. If there is some other stalling issue then it can re-occur.
you could check a selection of client id, and see when they start the request and end, see if there is a growing delay or whether it is subsecond all the time except for the timeout You can now show sub seconds in the icecast log by adding subsec to the level setting eg level subsec,4. If there is a constant significant delay under load then that would tend to indicate the scaling of the script is not there, but if it's more of a one-off then something is tripping it to do a big delay.
karll
Karl, I hve to modify my program for the new messages. Today I am on the road, so I will do it on the weekend. I seems to me, that there are a lot of opportunities to add, lieke subsec. Do you can provide me a list, what all is possible?
Georg
The subsec thing is quite recent kh21, and in general not that useful for people but can be helpful in debug as sometimes you want something more fine grained than a second.
There are a series of auth options which you can set, the main ones are in the doc but I have added others for setting certain limits like the following
handlers max number of concurrent auth threads for this auth timeout the allowed duration to elapse before pausing the auth on_error_wait the number of seconds to keep the auth disabled.
karl
To clear up a few things you raised. . Clients number are allocated when needed, a simple incremental thing so gaps may be down to short lived clients like metadata updates, and they are server-wide, not mount specific. . the timeout case is because nothing received in 6 seconds so, icecast is assuming there is some problem and disabling the auth link (in this case the remove) for 30 seconds. This is why none are attempted. eg
[2023-05-19 03:21:21] DBUG auth/auth_add_listener adding client #4449 for authentication
[2023-05-19 03:21:21] DBUG auth_url/url_add_listener handler 0 (/128.mp3) sending request (client 4449)
[2023-05-19 03:21:21] DBUG auth_url/url_add_listener handler 0 (/128.mp3) request finished (client 4449)
...
[2023-05-19 03:34:24] DBUG auth_url/url_remove_listener ...handler 0 (/128.mp3) sending request (client 4591)
[2023-05-19 03:34:30] WARN auth_url/url_remove_listener auth on /128.mp3 failed with "Operation timed out after 6001 milliseconds with 0 bytes received", listener remove on 104.40.229.71
[2023-05-19 03:34:30] DBUG auth/auth_remove_listener client #4591 completed
...
[2023-05-19 03:34:32] DBUG client/client_send_bytes Client 4449 connection on /128.mp3 from 182.232.44.213 died
[2023-05-19 03:34:32] DBUG source/source_listener_release Listener 4449 leaving /128.mp3
[2023-05-19 03:34:32] WARN auth_url/url_remove_listener auth on /128.mp3 disabled (client 4449, handler 0)
[2023-05-19 03:34:32] DBUG auth/auth_remove_listener client #4449 completed
As you can see, client 4591 left before 4449 at 03:34:24 and tried the remove script which timed out 6 seconds later. That meant the auth became disabled, so when 4449 left 2 seconds after the disable then that did not do the post. Like I said, there is no re-queuing for the failed case currently.
Now, we do not know why it took over 6 seconds, might be network outage or the script hung or there was some heavy contention which caused a huge delay. Either way, icecast assumes there is a problem, maybe temporary, and shuts down the auth for 30s and then re-enables.
it might be possible to re-queue the attempt but a couple of things need to line up first. freeing up resources, avoid indefinite retries and when to trigger them for example. The remove script may need checking to see what it could be blocking on.
One odd case seems to be client 283
[2023-05-18 21:12:34] DBUG client/client_send_bytes Client 283 connection on /128.mp3 from 37.76.47.76 died
[2023-05-18 21:12:34] DBUG source/source_listener_release Listener 283 leaving /128.mp3
[2023-05-18 21:12:34] DBUG auth_url/url_remove_listener ...handler 0 (/128.mp3) sending request (client 283)
....
[2023-05-18 21:12:39] DBUG auth/auth_run_thread Authenication thread (handler 0) shutting down
Could be a copy paste type bug
karl
Karl,
now I understood that the problem of the missing auth_url/url_remove_listener auth
is comming from the time-out before.
But here I have still a problem to understand.
What is Icecast expecting?
It saying 0 bytes recieved.
So what I have to do inside my PHP script?
It cannot be a network problem, because Icecast and the script is runnung on the same server!
icecast (libcurl part of it) expects the headers up to the blank line. Note that just because you do a header in the php, it does not mean it has actually sent it as it can replace them on subsequent calls. So there is a cache of headers before a flush across the connection is done. Now why there is a delay I don't know. You will have to do some sort of timing check, see where is stalls. It may be something with the php starting up or you connect to DB and that stalls things, could be a DNS query. You get a client id in the POST, so you know what listener number to match against in the error log. So try reporting the time, a client id and what stage you are in the script, append those to a log.
If you do a line for the start of the script then it should match the time in the error log for each attempt. If some part in the script stalls, then you need to narrow the case down by reporting subsequent lines.
00:00:00 4449 start script 00:00:00 4449 open DB 00:00:06 4449 send HTTP OK
you will know which client id is failing, so search in your created php log to see what it was doing and how long it took when compared with the icecast log entry time. While you may thinking everything is local, it maybe unintentionally using something external and that is the delay. It really does depend on what your script does.
karl
Karl,
what does three messages mean?
DBUG stats/stats_global_calc update global listener_connections (422)
The sum of all listener connection since the start of Icecast?
'DBUG stats/stats_global_calc update global client_connections (18144)'
What are client connections?
DBUG stats/stats_global_calc update global connections (18164)
What are global connections?
If I search in the error.log file for DBUG connection/_handle_get_request start with
I will founf strange things, not only my mounts.
e.g admin/metadata
, admin/listclients
, 128.mp3/;
, /statusbar.html
and a lot of more and that several times (14 and maybe more) in a second.
In the error.log
file I can not see any ip.
Could be this the difference between DBUG stats/stats_global_calc update global listener_connections (422)
and 'DBUG stats/stats_global_calc update global client_connections (18144)'?
And could be this accesses create the time-outs?
I have since a while the feeling that someone is trying to create problems on my server! Is there a way to monitor the ip adresses?
Thanks, Georg
the _connections stats are accumulative counts, the listeners* being for streaming clients. clients for web requests are different to those. admin metadata is used for metadata updates via the source clients, the ones with mounts followed by /; are oddballs. you can used alias for those or block the IP if it is a spammer (use the access log for checking those)