snapclient
snapclient copied to clipboard
Playback syncing with other clients
First of all thx for this great work so far 👍
From what i tested, there's still some issues with the time-syncing in your code (commit i used), so clients don't have synchronized playheads. Do you have the same behavior?
Do you know where to look in to the code to fix/add it or why it's not working (#not-implemented-yet 😄)? I'd be happy to have a look into it, but need a hint to start.
Hi Wolf
Nope - depends how you define not in sync. First you needed a esp32 with PSRAM to brake the 140ms buffer just using internal esp32 mem. ESP32 audio buffer and I2S pin config -> Use PSRAM in menuconfig.
Snapcast server defaults to 1000 ms buffer, min setting is 400 ms.
There is no hard time sync in the code jet and i have been trying out various strategies to get to where the code is today.
From here I think a good deep code / architecture clean up will be needed - to take it to next level.
For that next level we need to embed/use the time stamping and do some playback speed regulation fx by fine the audio PLL.
What is done now is just a static estimation for buffer size/ delay needed to serve the snapcast server buffer size.
Regards Jørgen
Hey guys I am trying to get this going and use LyratV4.3 as a development platform. I deleted lot's of stuff specific to Jorgen's platform which I didn't need from the original source code and currently I am trying to get syncing to work. Lot's of TODO's in my code but I think I am on the right track. Not sure how to proceed to get a perfect sync without skipping of chunks at the moment though. Maybe someone following development can have a look and give me some pointers (https://github.com/CarlosDerSeher/snapclient) Regards Karl
I haven't tried your code (I only have a wroom module, not a lyrat). But i think there is no other option than to drop samples. @badaix has this process described in the readme.md
Time deviations are corrected by playing faster/slower, which is done by removing/duplicating single samples (a sample at 48kHz has a duration of ~0.02ms). Typically the deviation is below 0.2ms.
Following this approach showed some good improvement in my investigations concerning synchronization, but only reached like 70% of the quality of a raspberry snapcast client (buffer underflow, noise)
Alright, I think I have to rethink my concept regarding synchronization. At the moment I am trying to keep undecoded chunks (20ms Flac data) in sync. Probably not the best way to do it know that I think about it. I think a better approach would be to write something like an own audio element to link in between decoder and i2s driver. Not sure how to get time information of the chunks there, esp adf uses ringbuffers for their audio pipline... Are you willing to share your code so I can have a look at your concept? How are you collecting time sync messages from the server to calculate the median on these to get server Now?
Regards Karl
I took portions of the main.c
and added a var time_diff_to_comp
case SNAPCAST_MESSAGE_TIME:
result = time_message_deserialize(&time_message, start, size);
if (result)
{
ESP_LOGI(TAG, "Failed to deserialize time message\r\n");
return;
}
tv1.tv_sec = base_message.received.sec;
tv1.tv_usec = base_message.received.usec;
double lat = time_message.latency.sec * 1000 + time_message.latency.usec / 1000;
double time2 = (base_message.received.sec - base_message.sent.sec) * 1000 + (base_message.received.usec - base_message.sent.usec) / 1000;
time_diff_to_comp = time2 + lat;
This var I provided to write_ringbuf
as an additional parameter
write_ringbuf_comp(audio, frame_size * 2 * sizeof(uint16_t), &time_diff_to_comp);
This is how it looks internally
#define maxDiffThreshold 0.3
// for now hardcoded: 16bit, left and right channel, should be provided by the server
#define bytesToCompensate (2 * 2)
#define maxCatchUpByteSpeed 32
size_t write_ringbuf(const uint8_t *data, size_t size, double *time_diff)
{
BaseType_t done;
if (-(*time_diff) > maxDiffThreshold)
{
done = xRingbufferSend(s_ringbuf_i2s, (void *)data, size, (portTickType)portMAX_DELAY);
// done = xRingbufferSend(s_ringbuf_i2s, (void *)(data + (size - bytesToCompensate - 1)), bytesToCompensate, (portTickType)portMAX_DELAY);
}
else if (-(*time_diff) < -maxDiffThreshold)
{
int factor = (int)(*time_diff / 0.1);
if (factor > maxCatchUpByteSpeed) factor = maxCatchUpByteSpeed;
factor = (int)(factor / 2) * 2;
printf("f: %d (%f)\n", factor, *time_diff);
// trim data
int correction_bytes = bytesToCompensate * factor;
*time_diff = *time_diff - correction_bytes * 1 / 48000.0f * 1000;
done = xRingbufferSend(s_ringbuf_i2s, (void *)data, size - correction_bytes, (portTickType)portMAX_DELAY);
}
else
{ done = xRingbufferSend(s_ringbuf_i2s, (void *)data, size, (portTickType)portMAX_DELAY); }
return (done) ? size : 0;
}
So far it catches only up when the playhead is behind. If it's ahead the commented line could help to double some samples. The factor in the catching up block, gets higher the more its lagging behind (this leads to a non-linear distortion in the sound when many samples are dropped).
Usually the protocol sends enough info to synchronize, the documentation is a bit thin on that end, how clients should handle unsynchronized moments. Messing with the Ringbuffer should be sufficient as far as i understood so far.
Hi
I reviewed you code did not try to run it - Months since I have had my ESP32 / amps on my night shift :-(
I got my self a pick an place machine and is hacking with hardware for that :-)
I can get rather god result on my set up but i also have a large buffer 600-800 ms. I think you challenge still is the pace at witch you package comes in and through the decoder and at that point you try to keep the length of the ring buffer to the required network latency.
I think the correct way is to through the time stamped chunk of samples on the ring buffer and then at the other end of the ringbuffer evaluate if the chunk of data needs to be played, stalled or stretched. This will allow for network jitter due to package loss and re-transmission. Also the stall / stretch can be done by tuning the ESP32 APLL by a simple locked loop.
This will be my next move on the sync stuff
/J
Usually the protocol sends enough info to synchronize, the documentation is a bit thin on that end, how clients should handle unsynchronized moments. Messing with the Ringbuffer should be sufficient as far as i understood so far.
I also do not find this question easy to understand just reading at the documentation of the protocol. I've added an issue for this. Feel free to vote for it and discuss there :-)
Hi All
I have implemented the sync concept as planned and the system behave very stable on my bench. I can stop the stream or get large package drops with out the ESP32 reboots as before. Now the backend (dsp_process) just flush the ringbuffer and wait to get in sync if the audio flow from the front end (time stamped decoder output) detect big drop outs or stops.
Sync to server is base on time message pkg latency and time stamp between two unsynced clocks (one on master and one on client) - Took me forever to find that there was no ref to epoch involved.
Code need clean up and must be put in to own task.
I have no plans to support the ESP_ADF for now as my sync concept fits very bad with the audio pipe line setup. For now at least.
/Jørgen
I have no plans to support the ESP_ADF for now as my sync concept fits very bad with the audio pipe line setup. For now at least. /Jørgen
Yeah honestly, with the mess caused by codec source code not being published, I'd say it's a sane move for now...
Also, have you received your esp32s2? does your snapclient work fine on it for you? (edit: just saw your answer on #15 where you mention this)
Still not powered up my s2 - trying to keep focus ... To little time
:-) since I now have an esp32, I'm also focusing on this target, trying to make it easier to use with other config (aka non ma12xx based dac).
Note that I've been playing with you latest latency control code, and it works fine for a while, but then it delays behing other snapcast clients fueled by the same snapcast server. Have you experienced the same issue? Do you want me to create a dedicated issue to discuss this?
I've also tested your code and found sync is only good for a short time for me. On my table, WiFi reception is kind of ... well let's say it is OK, so I find ESP client is getting out of sync from a client running on an Ubuntu machine pretty fast, both are connected vie WiFi. Server is a Raspberry Pi 3B connected via Ethernet. Problem seems worse if WiFi signal is low.
I have also been busy on this, uploaded my code right now. I am using hardware timer to get the timing right. The code needs a lot of polishing but it works good, except for some reason it will refuse to resume playback at some point. This will appear after a long time and I am not sure why this is happening, I am still working on this. When I forked and started development it felt like a good idea to support ADF, so I gave it a go and started using piplines and their codec, mostly because I wanted to use FLAC and it seemed easy to just use their library. Probably the problem is around time stamping the decoded FLAC chunks, this part seems a bit of a complicated and complex solution on my end. Hopefully I can integrate flac decoder at some point and compile it myself.
Oh now that I look at the code: there is currently no code to deal with keeping the client synchronized; it will synchronize at startup, but once synchronized it will not do anything. The code for these situations (playing with APLL adjustments) is commented.
It remains highly unclear to me how snapcast's synchronization supposed to works, let's dig a bit there...
shouldn't be the RTOS taking most of this work from our shoulders? Dedicating CPU1 completely to I2S and syncing, using hardware timers to get the deadline right I can do almost perfectly synced playback most of the time. It breaks from time to time and will get out of sync though, mostly related to packet loss and low wifi signals I believe. I have to look into that some time. I am doing synced playback for over an hour now :)
Hi @CarlosDerSeher I've been playing with your implementation and it looks to work nice! I'm currently using flac as codec since I've never been able to make adf's opus codec work (because of missing ogg container encapsulation I guess).
Since it's using adf pipelines, it's a very different approach than @jorgenkraghjakobsen's original one. I'll put my version of your branch adapted to the current master in a branch on my fork. At some it would be nice to have everything merged back in a single clean repo, not sure yet how to mix @jorgenkraghjakobsen non-adf approach with yours and keep the code clean and readable.
But first I need to understand your version better !
Glad to read it isn't just working for me. Though I am still having issues if I cover my Lyrat with the hand to simulate packet loss. At the moment it seems the code won't find a good initial sync again. I am working on that and will be uploading an update tonight. Initial sync seems critical afterwards soft syncing using apll should be easy. I still need to understand how badaix does statistics on chunk/sample ages. From what I see he is using 3 sets of buffers on which medians are generated. Based on that APLL change is calculated...
Merging shouldn't be so hard because I use two pipelines, one for decoding and one for I2S playback. The first one could easily be dropped and exchanged to any codec. I will probably do that at some point but first I want to get syncing right.
Hi @douardda @jorgenkraghjakobsen @constant-flow I proudly present to you my working version of snapcast :) After a lot of effort and quite some short nights I managed to get synced playback to work finally. Biggest mistake ever was trying to use ADF piplines. I dropped most of the ADF pipeline stuff because it introduces a lot of unpredictable delay and jitter at the i2s end, I wasted sooo much time with that. After writing to i2s directly, fine tuning i2s dma buffer lengths and counts it all started to fall into place. I compared the offset to a raspberry running the latest version of snapclient using a scope and a test signal and it seems to be almost the same. It is around +-1ms or less. I could provide a video of the test signal on the scope if you'd like to see it.
At decoding side I am still using ADF pipelines with write callback which provides the decoded data to the PCM data queue used to buffer audio samples. Hopefully with this it will be easy to exchange codec easily?!
At the moment it will only support flac decoder 16bit 48kHz because lots of things are hardcoded. I use Lyrat, you'll probably have to adopt i2s pin config to your hardware.
have a nice weekend testing :)
At the moment it will only support flac decoder 16bit 48kHz because lots of things are hardcoded. I use Lyrat, you'll probably have to adopt i2s pin config to your hardware.
Newbie question: FLAC is used only a transport vehicle between SnapCast Server and SnapCast clients? In other words: The input audio stream on the SnapCast Server can be e.g. MP3, which on the Server gets live transcoded into FLAC!? Or does it mean that the audio source has to be FLAC already (think of Internet Radio Stations, which are usually MP3 streams).
FLAC is used only a transport vehicle between SnapCast Server and SnapCast clients
@Wookbert
Absolutely. The server encodes whatever audio stream comes in as flac
(other formats are possible, but not for this project here) to reduce the size of the stream. The client decodes it to pure PCM/I2S signal and the DAC makes it analog.
What you describe should be possible. Just try it out with your phone (client, android has an app in the PlayStore), and your computer (server) where you set the encoding to flac
, If that works, it should work with this client as well (If the fix provided works as explained ... i didn't test yet ... but i will :) )
A few bugs are still left I guess :) Sometimes, especially if the stream on the server is started after the esp32 client is booted there will be errors. But I guess for now I can live with reseting the esp32 if that happens. Surely these are also solvable :)
Alright, I believe I've solved those issues I was talking about. It is only happening if silence is played from the server, increased queue lengths and seems to be solved.
As a side note, I add Wifi Provisioning to the code
Good nght :)
not sure if somebody tested yet, but I believe syncing works pretty stable now. I switched back to use opus decoder, as originally designed by jorgen because timing will be more predictable that way. The Interface to the time syncing part is as simple as writing decoded wire chunks to a queue. Hopefully that way it will be pulled by you jorgen and others :)
Hi @CarlosDerSeher I had no time for this lately, but I'll try your version ASAP (maybe this week). Thanks
@jorgenkraghjakobsen I tried to integrate my syncing implementation into the current master as good as I could. The thing is, I had to drop all the dsp_processor related stuff for now. If you are as satisfied as myself with the syncing then we have to rethink the dsp_processor stuff because that won't be compatible with my implementation. It relies heavily on control of all i2s related stuff. Your dsp_processor code should still be possible to use I think, but it certainly has to be adapted so it can be called on decoded raw pcm chunk data which then can be put into the player queue. I had a look at the code but was a little lost there. Would it even be possible to call al these filters on pcm data chunks of 20ms?
I created a branch in my account which has these changes applied to your current master https://github.com/CarlosDerSeher/snapclient/tree/jorgenMaster
Edit 1: alright :) seems it wasn't that hard to change dsp_processor
Hi guys, it's been a while since the last time I tried to get syncing up and running stable and smooth. I changed some more parts and add a custom i2s driver and I think with this the esp32 clients stay in sync very well. Also compared to an ubuntu notebook client they are well synced. The custom i2s driver is almost the same as the IDF one except for one additional function to fill DMA buffers before starting transmission. I pushed these changes to my master branch if someone is interested and willing to test in another environment.
Hello, I tried the branch from https://github.com/CarlosDerSeher/snapclient/tree/master on several ESP32 devices with psram and pcm5102, it works perfectly for me. The devices are very synchronous. Thank you for this change.
Glad to read, though I have to say I didn't do any work an master for quite a while now. There is a branch named NETCONN which has the most up to date changes. RAM usage is minimized there and syncing should work also on smaller WROVER modules with a sample rate of 48kHz, flac and buffer size of 500ms max. Syncing should be the same though and enabling PSRAM to get bigger buffers would work too probably. I always wanted to minimize RAM footprint to get a very thin client which could be used on even lower end devices than ESP32. But spare time is very limited currently :)
But if master is working for you why not just stay there and use it :)