snapclient icon indicating copy to clipboard operation
snapclient copied to clipboard

Playback syncing with other clients

Open constant-flow opened this issue 4 years ago • 27 comments

First of all thx for this great work so far 👍

From what i tested, there's still some issues with the time-syncing in your code (commit i used), so clients don't have synchronized playheads. Do you have the same behavior?

Do you know where to look in to the code to fix/add it or why it's not working (#not-implemented-yet 😄)? I'd be happy to have a look into it, but need a hint to start.

constant-flow avatar Sep 11 '20 13:09 constant-flow

Hi Wolf Nope - depends how you define not in sync. First you needed a esp32 with PSRAM to brake the 140ms buffer just using internal esp32 mem. ESP32 audio buffer and I2S pin config -> Use PSRAM in menuconfig. Snapcast server defaults to 1000 ms buffer, min setting is 400 ms. There is no hard time sync in the code jet and i have been trying out various strategies to get to where the code is today.
From here I think a good deep code / architecture clean up will be needed - to take it to next level. For that next level we need to embed/use the time stamping and do some playback speed regulation fx by fine the audio PLL. What is done now is just a static estimation for buffer size/ delay needed to serve the snapcast server buffer size. Regards Jørgen

jorgenkraghjakobsen avatar Sep 29 '20 06:09 jorgenkraghjakobsen

Hey guys I am trying to get this going and use LyratV4.3 as a development platform. I deleted lot's of stuff specific to Jorgen's platform which I didn't need from the original source code and currently I am trying to get syncing to work. Lot's of TODO's in my code but I think I am on the right track. Not sure how to proceed to get a perfect sync without skipping of chunks at the moment though. Maybe someone following development can have a look and give me some pointers (https://github.com/CarlosDerSeher/snapclient) Regards Karl

CarlosDerSeher avatar Dec 13 '20 21:12 CarlosDerSeher

I haven't tried your code (I only have a wroom module, not a lyrat). But i think there is no other option than to drop samples. @badaix has this process described in the readme.md

Time deviations are corrected by playing faster/slower, which is done by removing/duplicating single samples (a sample at 48kHz has a duration of ~0.02ms). Typically the deviation is below 0.2ms.

Following this approach showed some good improvement in my investigations concerning synchronization, but only reached like 70% of the quality of a raspberry snapcast client (buffer underflow, noise)

constant-flow avatar Dec 13 '20 22:12 constant-flow

Alright, I think I have to rethink my concept regarding synchronization. At the moment I am trying to keep undecoded chunks (20ms Flac data) in sync. Probably not the best way to do it know that I think about it. I think a better approach would be to write something like an own audio element to link in between decoder and i2s driver. Not sure how to get time information of the chunks there, esp adf uses ringbuffers for their audio pipline... Are you willing to share your code so I can have a look at your concept? How are you collecting time sync messages from the server to calculate the median on these to get server Now?

Regards Karl

CarlosDerSeher avatar Dec 15 '20 17:12 CarlosDerSeher

I took portions of the main.c and added a var time_diff_to_comp

case SNAPCAST_MESSAGE_TIME:
                result = time_message_deserialize(&time_message, start, size);
                if (result)
                {
                    ESP_LOGI(TAG, "Failed to deserialize time message\r\n");
                    return;
                }
                tv1.tv_sec = base_message.received.sec;
                tv1.tv_usec = base_message.received.usec;
                double lat = time_message.latency.sec * 1000 + time_message.latency.usec / 1000;
                double time2 = (base_message.received.sec - base_message.sent.sec) * 1000 + (base_message.received.usec - base_message.sent.usec) / 1000;

                time_diff_to_comp = time2 + lat;

This var I provided to write_ringbuf as an additional parameter

write_ringbuf_comp(audio, frame_size * 2 * sizeof(uint16_t), &time_diff_to_comp);

This is how it looks internally

#define maxDiffThreshold 0.3
// for now hardcoded: 16bit, left and right channel, should be provided by the server
#define bytesToCompensate (2 * 2)

#define maxCatchUpByteSpeed 32

size_t write_ringbuf(const uint8_t *data, size_t size, double *time_diff)
{
  BaseType_t done;
  if (-(*time_diff) > maxDiffThreshold)
  {
    done = xRingbufferSend(s_ringbuf_i2s, (void *)data, size, (portTickType)portMAX_DELAY);
    // done = xRingbufferSend(s_ringbuf_i2s, (void *)(data + (size - bytesToCompensate - 1)), bytesToCompensate, (portTickType)portMAX_DELAY);
  }
  else if (-(*time_diff) < -maxDiffThreshold)
  {
    int factor = (int)(*time_diff / 0.1);
    if (factor > maxCatchUpByteSpeed) factor = maxCatchUpByteSpeed;

    factor = (int)(factor / 2) * 2;
    printf("f: %d (%f)\n", factor, *time_diff);

    // trim data
    int correction_bytes = bytesToCompensate * factor;
    *time_diff = *time_diff - correction_bytes * 1 / 48000.0f * 1000;
    done = xRingbufferSend(s_ringbuf_i2s, (void *)data, size - correction_bytes, (portTickType)portMAX_DELAY);
  }
  else
  {  done = xRingbufferSend(s_ringbuf_i2s, (void *)data, size, (portTickType)portMAX_DELAY); }

  return (done) ? size : 0;
}

So far it catches only up when the playhead is behind. If it's ahead the commented line could help to double some samples. The factor in the catching up block, gets higher the more its lagging behind (this leads to a non-linear distortion in the sound when many samples are dropped).

Usually the protocol sends enough info to synchronize, the documentation is a bit thin on that end, how clients should handle unsynchronized moments. Messing with the Ringbuffer should be sufficient as far as i understood so far.

constant-flow avatar Dec 15 '20 18:12 constant-flow

Hi I reviewed you code did not try to run it - Months since I have had my ESP32 / amps on my night shift :-( I got my self a pick an place machine and is hacking with hardware for that :-) I can get rather god result on my set up but i also have a large buffer 600-800 ms. I think you challenge still is the pace at witch you package comes in and through the decoder and at that point you try to keep the length of the ring buffer to the required network latency.
I think the correct way is to through the time stamped chunk of samples on the ring buffer and then at the other end of the ringbuffer evaluate if the chunk of data needs to be played, stalled or stretched. This will allow for network jitter due to package loss and re-transmission. Also the stall / stretch can be done by tuning the ESP32 APLL by a simple locked loop. This will be my next move on the sync stuff /J

jorgenkraghjakobsen avatar Dec 15 '20 22:12 jorgenkraghjakobsen

Usually the protocol sends enough info to synchronize, the documentation is a bit thin on that end, how clients should handle unsynchronized moments. Messing with the Ringbuffer should be sufficient as far as i understood so far.

I also do not find this question easy to understand just reading at the documentation of the protocol. I've added an issue for this. Feel free to vote for it and discuss there :-)

douardda avatar Feb 11 '21 20:02 douardda

Hi All I have implemented the sync concept as planned and the system behave very stable on my bench. I can stop the stream or get large package drops with out the ESP32 reboots as before. Now the backend (dsp_process) just flush the ringbuffer and wait to get in sync if the audio flow from the front end (time stamped decoder output) detect big drop outs or stops. Sync to server is base on time message pkg latency and time stamp between two unsynced clocks (one on master and one on client) - Took me forever to find that there was no ref to epoch involved.
Code need clean up and must be put in to own task. I have no plans to support the ESP_ADF for now as my sync concept fits very bad with the audio pipe line setup. For now at least. /Jørgen

jorgenkraghjakobsen avatar Feb 11 '21 21:02 jorgenkraghjakobsen

I have no plans to support the ESP_ADF for now as my sync concept fits very bad with the audio pipe line setup. For now at least. /Jørgen

Yeah honestly, with the mess caused by codec source code not being published, I'd say it's a sane move for now...

Also, have you received your esp32s2? does your snapclient work fine on it for you? (edit: just saw your answer on #15 where you mention this)

douardda avatar Feb 12 '21 08:02 douardda

Still not powered up my s2 - trying to keep focus ... To little time

jorgenkraghjakobsen avatar Feb 13 '21 22:02 jorgenkraghjakobsen

:-) since I now have an esp32, I'm also focusing on this target, trying to make it easier to use with other config (aka non ma12xx based dac).

Note that I've been playing with you latest latency control code, and it works fine for a while, but then it delays behing other snapcast clients fueled by the same snapcast server. Have you experienced the same issue? Do you want me to create a dedicated issue to discuss this?

douardda avatar Feb 14 '21 16:02 douardda

I've also tested your code and found sync is only good for a short time for me. On my table, WiFi reception is kind of ... well let's say it is OK, so I find ESP client is getting out of sync from a client running on an Ubuntu machine pretty fast, both are connected vie WiFi. Server is a Raspberry Pi 3B connected via Ethernet. Problem seems worse if WiFi signal is low.

I have also been busy on this, uploaded my code right now. I am using hardware timer to get the timing right. The code needs a lot of polishing but it works good, except for some reason it will refuse to resume playback at some point. This will appear after a long time and I am not sure why this is happening, I am still working on this. When I forked and started development it felt like a good idea to support ADF, so I gave it a go and started using piplines and their codec, mostly because I wanted to use FLAC and it seemed easy to just use their library. Probably the problem is around time stamping the decoded FLAC chunks, this part seems a bit of a complicated and complex solution on my end. Hopefully I can integrate flac decoder at some point and compile it myself.

CarlosDerSeher avatar Feb 15 '21 20:02 CarlosDerSeher

Oh now that I look at the code: there is currently no code to deal with keeping the client synchronized; it will synchronize at startup, but once synchronized it will not do anything. The code for these situations (playing with APLL adjustments) is commented.

It remains highly unclear to me how snapcast's synchronization supposed to works, let's dig a bit there...

douardda avatar Feb 16 '21 18:02 douardda

shouldn't be the RTOS taking most of this work from our shoulders? Dedicating CPU1 completely to I2S and syncing, using hardware timers to get the deadline right I can do almost perfectly synced playback most of the time. It breaks from time to time and will get out of sync though, mostly related to packet loss and low wifi signals I believe. I have to look into that some time. I am doing synced playback for over an hour now :)

CarlosDerSeher avatar Feb 17 '21 18:02 CarlosDerSeher

Hi @CarlosDerSeher I've been playing with your implementation and it looks to work nice! I'm currently using flac as codec since I've never been able to make adf's opus codec work (because of missing ogg container encapsulation I guess).

Since it's using adf pipelines, it's a very different approach than @jorgenkraghjakobsen's original one. I'll put my version of your branch adapted to the current master in a branch on my fork. At some it would be nice to have everything merged back in a single clean repo, not sure yet how to mix @jorgenkraghjakobsen non-adf approach with yours and keep the code clean and readable.

But first I need to understand your version better !

douardda avatar Feb 23 '21 19:02 douardda

Glad to read it isn't just working for me. Though I am still having issues if I cover my Lyrat with the hand to simulate packet loss. At the moment it seems the code won't find a good initial sync again. I am working on that and will be uploading an update tonight. Initial sync seems critical afterwards soft syncing using apll should be easy. I still need to understand how badaix does statistics on chunk/sample ages. From what I see he is using 3 sets of buffers on which medians are generated. Based on that APLL change is calculated...

Merging shouldn't be so hard because I use two pipelines, one for decoding and one for I2S playback. The first one could easily be dropped and exchanged to any codec. I will probably do that at some point but first I want to get syncing right.

CarlosDerSeher avatar Feb 24 '21 07:02 CarlosDerSeher

Hi @douardda @jorgenkraghjakobsen @constant-flow I proudly present to you my working version of snapcast :) After a lot of effort and quite some short nights I managed to get synced playback to work finally. Biggest mistake ever was trying to use ADF piplines. I dropped most of the ADF pipeline stuff because it introduces a lot of unpredictable delay and jitter at the i2s end, I wasted sooo much time with that. After writing to i2s directly, fine tuning i2s dma buffer lengths and counts it all started to fall into place. I compared the offset to a raspberry running the latest version of snapclient using a scope and a test signal and it seems to be almost the same. It is around +-1ms or less. I could provide a video of the test signal on the scope if you'd like to see it.

At decoding side I am still using ADF pipelines with write callback which provides the decoded data to the PCM data queue used to buffer audio samples. Hopefully with this it will be easy to exchange codec easily?!

At the moment it will only support flac decoder 16bit 48kHz because lots of things are hardcoded. I use Lyrat, you'll probably have to adopt i2s pin config to your hardware.

have a nice weekend testing :)

CarlosDerSeher avatar Apr 23 '21 07:04 CarlosDerSeher

At the moment it will only support flac decoder 16bit 48kHz because lots of things are hardcoded. I use Lyrat, you'll probably have to adopt i2s pin config to your hardware.

Newbie question: FLAC is used only a transport vehicle between SnapCast Server and SnapCast clients? In other words: The input audio stream on the SnapCast Server can be e.g. MP3, which on the Server gets live transcoded into FLAC!? Or does it mean that the audio source has to be FLAC already (think of Internet Radio Stations, which are usually MP3 streams).

Wookbert avatar Apr 23 '21 12:04 Wookbert

FLAC is used only a transport vehicle between SnapCast Server and SnapCast clients

@Wookbert Absolutely. The server encodes whatever audio stream comes in as flac (other formats are possible, but not for this project here) to reduce the size of the stream. The client decodes it to pure PCM/I2S signal and the DAC makes it analog.

What you describe should be possible. Just try it out with your phone (client, android has an app in the PlayStore), and your computer (server) where you set the encoding to flac, If that works, it should work with this client as well (If the fix provided works as explained ... i didn't test yet ... but i will :) )

constant-flow avatar Apr 23 '21 12:04 constant-flow

A few bugs are still left I guess :) Sometimes, especially if the stream on the server is started after the esp32 client is booted there will be errors. But I guess for now I can live with reseting the esp32 if that happens. Surely these are also solvable :)

CarlosDerSeher avatar Apr 23 '21 15:04 CarlosDerSeher

Alright, I believe I've solved those issues I was talking about. It is only happening if silence is played from the server, increased queue lengths and seems to be solved.

As a side note, I add Wifi Provisioning to the code

Good nght :)

CarlosDerSeher avatar Apr 23 '21 20:04 CarlosDerSeher

not sure if somebody tested yet, but I believe syncing works pretty stable now. I switched back to use opus decoder, as originally designed by jorgen because timing will be more predictable that way. The Interface to the time syncing part is as simple as writing decoded wire chunks to a queue. Hopefully that way it will be pulled by you jorgen and others :)

CarlosDerSeher avatar May 16 '21 09:05 CarlosDerSeher

Hi @CarlosDerSeher I had no time for this lately, but I'll try your version ASAP (maybe this week). Thanks

douardda avatar May 17 '21 08:05 douardda

@jorgenkraghjakobsen I tried to integrate my syncing implementation into the current master as good as I could. The thing is, I had to drop all the dsp_processor related stuff for now. If you are as satisfied as myself with the syncing then we have to rethink the dsp_processor stuff because that won't be compatible with my implementation. It relies heavily on control of all i2s related stuff. Your dsp_processor code should still be possible to use I think, but it certainly has to be adapted so it can be called on decoded raw pcm chunk data which then can be put into the player queue. I had a look at the code but was a little lost there. Would it even be possible to call al these filters on pcm data chunks of 20ms?

I created a branch in my account which has these changes applied to your current master https://github.com/CarlosDerSeher/snapclient/tree/jorgenMaster

Edit 1: alright :) seems it wasn't that hard to change dsp_processor

CarlosDerSeher avatar May 27 '21 11:05 CarlosDerSeher

Hi guys, it's been a while since the last time I tried to get syncing up and running stable and smooth. I changed some more parts and add a custom i2s driver and I think with this the esp32 clients stay in sync very well. Also compared to an ubuntu notebook client they are well synced. The custom i2s driver is almost the same as the IDF one except for one additional function to fill DMA buffers before starting transmission. I pushed these changes to my master branch if someone is interested and willing to test in another environment.

CarlosDerSeher avatar Aug 19 '21 20:08 CarlosDerSeher

Hello, I tried the branch from https://github.com/CarlosDerSeher/snapclient/tree/master on several ESP32 devices with psram and pcm5102, it works perfectly for me. The devices are very synchronous. Thank you for this change.

Medel avatar Mar 18 '22 20:03 Medel

Glad to read, though I have to say I didn't do any work an master for quite a while now. There is a branch named NETCONN which has the most up to date changes. RAM usage is minimized there and syncing should work also on smaller WROVER modules with a sample rate of 48kHz, flac and buffer size of 500ms max. Syncing should be the same though and enabling PSRAM to get bigger buffers would work too probably. I always wanted to minimize RAM footprint to get a very thin client which could be used on even lower end devices than ESP32. But spare time is very limited currently :)

But if master is working for you why not just stay there and use it :)

CarlosDerSeher avatar Mar 22 '22 07:03 CarlosDerSeher