snapcast icon indicating copy to clipboard operation
snapcast copied to clipboard

Add some documentation to the repository

Open Lakritzator opened this issue 8 years ago • 28 comments

This is certainly a cool project, with a lot of potential.

But I would also love to see clients on Windows 10 (phone, desktop etc.) and am willing to have a go at some of the basics of a .NET application.

The problem I am facing is a lack of documentation, I know that an answer would be "the code is the documentation", but some basic information might help... like "how" (format) are the packages transferred over the wire, how is the synchronization with the audio-clock made etc.

I found a nice document (master's thesis) on "Synchronization of streamed audio between multiple playback devices over an unmanaged IP network" here: http://www.eit.lth.se/sprapport.php?uid=894 (don't know what license it covers, there is no real copyright on it). This should cover the basics for most people, but maybe it can help to make some parts more understandable or even have snapcast implement some standards...

Anyway, like I said, documentation would be nice. But even without, keep up the good work!

Lakritzator avatar Jan 12 '16 15:01 Lakritzator

Hi Lakritzator,

I'm aware of it. I was trying at least to document the header files :wink:
A specification of the "snapcast protocol" is actually completely missing. Also, with the development branch, there will be a "remote control API", using json RPC in the v0.5.
I'm planing to start a specification document for both (snapcast protocol and control API). Does Windows 10 (phone) support native c++ applications? The v0.5 (develop branch) introduces support for a native Android client (it's actually a wrapper app, that starts a native binary on the underlying Linux system).
Maybe it's easier to get the c++ code compiled on Windows. Because of the Android port, I've already started to strip down the client (e.g. "#define'd out" OGG Vorbis support, capability to run as a daemon) and get rid of some dependencies (i.e. Boost libs) and added an interface for different player APIs (Linux uses ALSA, Android OpenSL ES, Windows would use Core Audio?).

badaix avatar Jan 12 '16 15:01 badaix

Hi Badaix,

no hurry, but it would be a shame to have your product ignored due to a lack of documentation.

Microsoft has added a new "platform" to Windows 10 called the Universal Windows Platform (UWP), which runs on the Windows 10 kernel available on the Desktop (and tablets), Phone, IoT (raspberry pi) and should even work on the HoloLens.

According to this: https://msdn.microsoft.com/en-us/library/windows/apps/dn996906.aspx it should support C++ which has an advantage that we don't need to port/change a lot of the underlying code. Still the audio might be another thing...

I was thinking of building a C# application, using NAudio, at least a prototype. But this would be hard to maintance, if you make changes to the protocol etc.

What Audio engine can be used in UWP? I don't know yet, need to check... I'm not sure about the timing either, this is why I needed a bit more information.

Having a remote control API for the multi room part would certainly be nice.

P.S. What about having a "plug-in" system on the server, which allows command to handle next/previous pause/play/stop? So one can control e.g.one (or multiple) MPD servers via the Snapcast clients. Anyway, not a priority...

Lakritzator avatar Jan 12 '16 20:01 Lakritzator

I would be really interested in a more detailed documentation on the JSON Interface. I started to create some scripts for my home automation system based on the information in [1] & [2] . But so far it only covers volume control for each snapclient instance. If i knew the complete story, I could combine it all in one nifty OpenHab addon.

[1] https://github.com/badaix/snapcast/blob/master/android/Snapcast/src/main/java/de/badaix/snapcast/control/RemoteControl.java [2] https://github.com/badaix/snapcast/tree/master/control

m-kloeckner avatar May 06 '16 08:05 m-kloeckner

I would also like to have documentation for the protocol which is used, I'm looking into creating hardware based clients for a multi-room playback system and for those I'd have to deduce the protocl from source as I need to implement a client myself.

mightyCelu avatar Apr 06 '18 13:04 mightyCelu

I would love to build an ESP8266 client. The ESP8266 costs 2$ and has i2s. A very simple PCM stream to i2s example is available here: https://gist.github.com/me-no-dev/e8b5131f840822cdd14b

Sadly snapcast doesn't seem to have any documentation on the protocol.

MarcSN311 avatar Jan 10 '19 17:01 MarcSN311

I'm also very interested in protocol documentation. I've also dabbled with the ESP and I2S DAC's to see what could be done, but the lack of reference is a bigger hindrance than the hardware limitations! @badaix, I understand this is really boring and tedious work for you, but been three years now and it would be mighty helpful to make snapcast easier to integrate, which in turn would increase the success of the project!

xSmurf avatar Jun 10 '19 16:06 xSmurf

Would there be any interest in collaboratively documenting the protocol? I'm interested in porting the client side to ESP32, it would be great to get some pointers to get me started.

jerryr avatar Nov 25 '19 09:11 jerryr

Will an ESP32 be strong enough to do flac/ogg/opus decoding? The smallest cpu I ran spanclient on was some 400MHz MIPS based router.

badaix avatar Nov 25 '19 13:11 badaix

I haven't tested for myself, but there is an official esp-adf API that supports all of these (plus more)Link. The ESP32 has 2 cores running at 280 MHz, which should be fine for a single stereo stream, I think.

jerryr avatar Nov 26 '19 03:11 jerryr

I would like a documentation of the protocol, as this opens up the widespread of snapcast and hopefully, more and more nice things. The esp32 would be a good extension as client, if this works :) I am not the programmer, but i like to help if needed

rcmcronny avatar Nov 26 '19 08:11 rcmcronny

Question: the Hello message has a MAC address attribute, what does the server use that for? Same question for OS and Arch attributes.

jerryr avatar Nov 26 '19 14:11 jerryr

These are just informational fields. The MAC has been used as unique ID in earlier versions.

badaix avatar Nov 26 '19 14:11 badaix

I'm trying to reverse the protocol by reading the code, and I got a little stuck here: ClientConnection::getNextMessage():

socketRead(&buffer[0], baseMsgSize);
    baseMessage.deserialize(&buffer[0]);
    //	LOG(DEBUG) << "getNextMessage: " << baseMessage.type << ", size: " << baseMessage.size << ", id: " << baseMessage.id << ", refers: " <<
    // baseMessage.refersTo << "\n";
    if (baseMessage.size > buffer.size())
        buffer.resize(baseMessage.size);
    //	{
    //		std::lock_guard<std::mutex> socketLock(socketMutex_);
    socketRead(&buffer[0], baseMessage.size);

What is the second socketRead() call for?

jerryr avatar Nov 28 '19 11:11 jerryr

The first read is to get just the base massage portion, it's kind of a common header, with the size baseMsgSize. The header contains the type and complete message size, that is now available in the field baseMessage.size. Now that we know the complete size, we can start a second read to get the complete message.

badaix avatar Nov 28 '19 12:11 badaix

Thanks, that makes sense. Can you give me some idea about how the codecs are negotiated? snapserver (v0.17.1) seems to always start a flac stream as soon as the client connects, regardless of the "codec" setting in configuration. Does the client need to implement all the codecs? Is there a bare minimum it can get away with?

jerryr avatar Nov 30 '19 03:11 jerryr

I'm definitely interested in collaborating on porting this to esp32 as well. I've tested the esp-adf mp3 examples and they worked fine for me. I'll see about FLAC next probably.

I briefly looked over the snapcast client code but had trouble nailing down where the protocol decoding was happening. I may try poking around the function ClientConnection::getNextMessage() next.

bridadan avatar Dec 08 '19 20:12 bridadan

The python-snapcast project is not only a python control client, but also has some basic player client code that might be easier to read.

Note: This is experimental. Synchronization is not yet supported. Requires GStreamer 1.0.

import snapcast.client

client = snapcast.client.Client('localhost', snapcast.client.SERVER_PORT)
client.register()
client.request_start() # this blocks

A client must first send a hello message (in void Controller::worker()) and start listening to incoming messages (void Controller::onMessageReceived). The server responds the hello with a ServerSettings message. Before the start of an audio stream a CodecHeader message is sent, telling the client what decoder to use. The actual stream is sent as WireChunk messages.

badaix avatar Dec 08 '19 21:12 badaix

Fantastic! That's a great starting point, thanks for the pointer!

bridadan avatar Dec 08 '19 21:12 bridadan

I haven't had a lot of time to spend on this recently, however I did confirm today that the ESP-ADF's FLAC decoder works without issues (at least on the 2 second sample that I tested with).

bridadan avatar Dec 14 '19 22:12 bridadan

The client implementation in python-snapcast is for a much older Snapcast version at this point. I'm not sure how much the protocol has diverged. I'd happily accept a PR from anyone to update or improve it though!

happyleavesaoc avatar Dec 20 '19 16:12 happyleavesaoc

I've been able to make a lot of headway during the holidays! I've been able to re-implement most of the message parsing in C here: https://github.com/bridadan/libsnapcast

Still not quite finished, but I'm already getting messages exchanged over a TCP socket connected to snapserver. My hope is to get a non-synchronized FLAC stream running on Linux using the C library, then get that running on the ESP32, then go back and take care of synchronization. Once I have that MVP going I'll look at documenting the protocol in more depth.

bridadan avatar Dec 24 '19 19:12 bridadan

@badaix Can you explain the usage of Stream::getPlayerChunk()? Especially the second parameter (timeout) is a bit confusing.

jerryr avatar Jan 16 '20 16:01 jerryr

I wouldn't mind a high-level explanation of that one as well. That function is basically my next thing to tackle for synchronized playback and I've been procrastinating 😄. I haven't spent a whole lot of time trying to understanding it, so having a high-level outline would certainly help parse through it, thanks!

bridadan avatar Jan 17 '20 14:01 bridadan

@jerryr this function is where the magic happens :wink: The main playback loop, no matter what implementation (alsa, coreaudio, ...) is as simple as:

  1. get PCM data from the Snapserver: as many frames as the DAC wants to have in his input buffer (lets say 50ms)
  2. Pass this data to the DAC

To synchronize the audio, a lower level audio interface - as alsa is one - can tell the current internal buffer size and estimate when data is played out, as with snd_pcm_delay(handle_, &framesDelay):

For playback the delay is defined as the time that a frame that is written to the PCM stream shortly after this call will take to be actually audible. It is as such the overall latency from the write call to the final DAC.

Knowing the current delay, e.g. 100ms, we must feed PCM data that should be audible in 100ms to the DAC. This is exactly the outputBufferDacTime of

bool Stream::getPlayerChunk(void* outputBuffer, const cs::usec& outputBufferDacTime, unsigned long framesPerBuffer)

The stream class is aware of

  1. the total configured buffer size in milliseconds (server side parameter [server] buffer = 1000)
  2. the server's current time (client and server permanently sync their system time, so the client knows the delta of his local time to the server's local time)
  3. the PCM timestamps: the server sends PCM chunks of size ~ ([server] chunk_ms = 20) (~ because of the used codec). Every chunk is timestamped with the server's local time. Knowing the samplerate, you can calculate the timestamp for every single sample in the whole stream.

So when the player calls getPlayerChunk(buffer, 100ms, 50ms = 48000/1000*50 frames), the stream will return 50ms of PCM data that is 1000ms (total buffer size = delay) - 100ms = 900ms "old" (has been recorded by the server 900ms ago), so that it will be audible 100ms later, i.e. when it is 1000ms old.

Problem is that there is some noise in this loop. The estimated DAC output time will be noisy and you will not pass the loop exactly every 50ms when feeding 50ms in the DAC, also the DAC might play the audio @48003 Hz instead of 48kHz, the stream might return PCM data that should be audible in 902 ms instead of 900ms, just because there is no recent data left. The stream gathers statistics about the estimated deviation and when the statistics tell us that the deviation is to high, e.g. we're 2ms early or late for a while, the stream is stretched or shortened my 2ms to stay in sync.

You can rely on the Pcm chunk timestamps. The really first chunk is tagged with the current server time and the following timestamps are calculated based on the samplerate and the number of samples, i.e. +1s every 48000 samples, so the PcmChunk timestamps can be assumed to be reliable. Overall Snapcast is all about recording streams and routing them to different clients. On client side you can simply pass the received chunks to the DAC. Actually, the synchronized playback is the only critical implementation dependent part:

  • get first initial sync fastly
  • keep in sync without quality degradation

badaix avatar Jan 18 '20 19:01 badaix

btw: there is a C implementation of Snapcast: SnapCastC. I think this implementation uses the same protocol and the player should be compatible to Snapcast.

badaix avatar Jan 18 '20 19:01 badaix

Nice description - Badaix good starting point for my work tonight. Ran over your snapcast code last night and got it running on my laptop and android phone. Tonight i will play with Brians code on my esp32 and Amps.

jorgenkraghjakobsen avatar Jan 18 '20 21:01 jorgenkraghjakobsen

I'm definitely interested in collaborating on porting this to esp32 as well. I've tested the esp-adf mp3 examples and they worked fine for me. I'll see about FLAC next probably.

I briefly looked over the snapcast client code but had trouble nailing down where the protocol decoding was happening. I may try poking around the function ClientConnection::getNextMessage() next.

I guess we are a few interested in such a project. Maybe @jorgenkraghjakobsen should mention he started such a project on https://github.com/jorgenkraghjakobsen/snapclient :-)

douardda avatar Jan 18 '21 16:01 douardda

@Lakritzator I know it's literally been half a decade since your original post, but I ended up creating a library that does almost exactly what you described there. Ironically, my motivation behind it was to have an iOS player (via Xamarin). I haven't looked at UWP at all but it should be pretty straightforward to support from here, if there's still interest for it :) https://github.com/stijnvdb88/Snap.Net/tree/master/Snap.Net.SnapClient

stijnvdb88 avatar Apr 16 '21 19:04 stijnvdb88