audio icon indicating copy to clipboard operation
audio copied to clipboard

is this API appropriate, especially for real time use

Open mattetti opened this issue 8 years ago • 92 comments

This discussion is a follow up from this initial proposal. The 2 main arguments that were raised are:

  • Is this API appropriate for real time usage (especially in regards to allocation and memory size)
  • is the interface too big/not adequate

@egonelbre @kisielk @nigeltao @taruti all brought up good points and Egon is working on a counter proposal focusing on smaller interfaces with compatibility with types commonly found in the wild (int16, float32).

As mentioned in the original proposal, I'd like to this organization of a special interest group of people interested in doing more/better audio in Go. I have to admit my focus hasn't been real time audio and I very much appreciate the provided feedback. We all know this is a challenging issue which usually results in a lot of libraries doing things in very different ways. However, I do want to believe that we, as a community and with the support of the core team, can come up with a solid API for all Go audio projects.

mattetti avatar Jan 04 '17 17:01 mattetti

The link to the alternate design: https://github.com/egonelbre/exp/tree/master/audio

101 of real-time audio http://www.rossbencina.com/code/real-time-audio-programming-101-time-waits-for-nothing

egonelbre avatar Jan 04 '17 17:01 egonelbre

@egonelbre would you mind squashing your commits for the proposal or maybe send a PR. GitHub really makes it hard to comment on different part of the code coming from different commits :(

mattetti avatar Jan 04 '17 18:01 mattetti

Typically when using audio my needs have been:

  1. Read from input source (typically system IO + slice of []int16 or []float32)
  2. Filter&downsample&convert to preferred internal format (typically []float32)
  3. Do all internal processing with that type (typically []float32)
  4. Maintain as little latency as possible by keeping cpu and memory allocation (and with that GC) in check

taruti avatar Jan 04 '17 18:01 taruti

@mattetti sure no problem.

Say you are designing a sample-based synthesizer (eg: Akai MPC) and your project has an audio pool it is working with. You'll want to be storing those samples in memory in the native format of your DSP path so you don't have to waste time doing conversions every time you are reading from your audio pool.

@kisielk sure, if you have such sample-based synth you probably need to track what notes are playing, etc. anyways so you would have a Synth node that produces float32/float64, i.e. you pay the conversion per synth not per sample. It's not as good as no conversion, but it just means you can have one less-effect overall for the same performance.

egonelbre avatar Jan 04 '17 18:01 egonelbre

@mattetti Here you go: https://github.com/egonelbre/exp/commit/81ba19e90fbcb31986c801838a17606c76dfd4d9

egonelbre avatar Jan 04 '17 18:01 egonelbre

Yes but the "synth" is not going to be limited to one sample, usually you have some number of channels, say 8-16, and each one can choose any part of any sample to play at any time. In my opinion processing audio in float64 is pretty niche, relegated to some high precision or quality filters which aren't commonly used. Even in that case, the data can be converted to float64 for processing just within that filter block, there's little reason to store it in anything but float32 otherwise. Even still most DSP is performed using float32 even on powerful architectures like x86, reason being that you can do twice as much with SIMD instructions in that case.

Of course I'm totally fine with having float64 as an option for a buffer type when appropriate, but I believe that float32 should be on par. I feel like it would certainly be the primary format for any real-time applications. Even for batch processing you are likely to see performance gains from using it.

kisielk avatar Jan 04 '17 18:01 kisielk

@kisielk Yes, also, for my own needs float32 would be completely sufficient.

Forums seemed to agree that in most cases float64 isn't a signifcant improvement. However, if one of the intended targets will be writing audio plugins; then many plugin API-s include float64 version (e.g. VST3) and DAW-s have an option to switch between float32 and float64.

I agree that, if only one should be chosen then float32 seems more suitable. (Although. I don't think I have the full knowledge of audio processing to definitively say it.) The only argument for float64 is that math package works on float64. So only using float32 means there is a need for math32 package.

egonelbre avatar Jan 04 '17 19:01 egonelbre

I agree that float32 is usually plenty enough but as mentioned my problem is that the Go math package is float64 only. Are we willing to reimplement the math functions we need? It might make sense if we start doing asm optimizations but that's quite a lot of work.

mattetti avatar Jan 04 '17 19:01 mattetti

Again, I don't think it's a binary choice, I just think that both should have equal support within the API. And yes, if I was using Go for realtime processing of audio I would definitely want a 32-bit version of the math package. I don't think the math package needs to dictate any limitations on any potential audio API.

kisielk avatar Jan 04 '17 19:01 kisielk

@kisielk sounds fair, just to be clear, would you be interested in using Go for realtime processing or at least giving it a try? You obviously do that for a living using C++ so your expertise would be invaluable.

mattetti avatar Jan 04 '17 19:01 mattetti

Are we willing to reimplement the math functions we need?

How much math functions are needed in practice? Initially the package could be a wrapper around math to make it more convenient and then start optimizing the bottlenecks. I never needed more than sin/cos/exp/abs/rand; but I've never done anything complicated either.

I suspect some of the first bottleneck and candidate for "asm optimized" code will be []int16->[]float32 conversion, buffer multiplication and/or addition two buffers together.

egonelbre avatar Jan 04 '17 20:01 egonelbre

@mattetti that is something I'm definitely interested in. I'm not exactly a DSP expert, but I work enough with it day to day to be fairly familiar with the domain.

@egonelbre Gain is also a big one that benefits from optimization. (edit: maybe that's what you meant by buffer multiplication, or did you mean convolution?)

kisielk avatar Jan 04 '17 20:01 kisielk

@kisielk yeah, I meant gain :), my brains language unit seems to be severely malfunctioning today.

egonelbre avatar Jan 04 '17 20:01 egonelbre

math package (trigonometric, logarithmic, etc) with float32 and SIMD optimization for any data type are two different things. In many cases just mult/add/sub/div are needed and for those package math is not needed.

I think that math32 and SIMD are best kept separate from this proposal.

If we are thinking of performance then conversions of buffers without needing to allocate can be important. For example have one input buffer and one output buffer for the conversion. Instead of allocating a new output buffer each time.

taruti avatar Jan 04 '17 20:01 taruti

@taruti +:100:

kisielk avatar Jan 04 '17 20:01 kisielk

Speaking of conversion between buffers, I think it's important the API has a way to facilitate conversion between buffers of different data types and sizes without allocation (eg: 2 channels to 1, etc). The actual conversion method would be determined by the application but at least the API should be able to help facilitate this without too much additional complexity.

kisielk avatar Jan 04 '17 20:01 kisielk

Alright, here is my suggestion. I'll add you guys to the organization and we can figure out an API for real time processing and from there see how it works for offline. Ideally I would love to end with:

  • a generic audio API (what we are discussing here)
  • a list of codecs (I started with wav and aiff, they still need work and refinement but they work)
  • a set of transforms (gain, dynamics, eq, lfos)
  • analyzers (FFT and things like chromagrams, key, onset detectors...)
  • generators

@rakyll and I also discussed adding wrappers to things like CoreAudio on Mac so we could have an end to end experience without having to rely on things like portaudio. This is outside of the scope of what I have in mind but I figured I should mentioned it.

I like designing APIs against real usage, so maybe a first good step is to define an example we would like to build and from there define the components we need. Thoughts?

mattetti avatar Jan 04 '17 21:01 mattetti

That sounds like a good idea to me. However I would propose we limit the scope of the core audio package to the first two points (and perhaps a couple of very general utilities from point 3). I feel like the rest would be better suited for other packages. My main reasoning behind this is that I feel like the first two items can be achieved (relatively) objectively and there can be one canonical implementation. As you go down the list it becomes increasingly application-dependent.

kisielk avatar Jan 04 '17 22:01 kisielk

I think the audio API should be in its own package and each of those things in separate packages. For instance I have the wav and aiff packages isolated. That's another reason why having a GitHub organization is nice.

mattetti avatar Jan 04 '17 22:01 mattetti

Just noticed that when looking at the org page. Looks good to me 👍

kisielk avatar Jan 04 '17 22:01 kisielk

There's the original proposal. @egonelbre has an alternative proposal. Here are a couple more (conflicting) API ideas for a Buffer type. I'm not saying that either of them are any good, but there might be a useful core in there somewhere. See also another API design in the github.com/azul3d/engine/audio package.

Reader/Writer-ish:

type Buffer interface {
	Format() Format

	// The ReadFrames and WriteFrames methods are roughly analogous to bulk
	// versions of the Image.At and Image.Set methods from the standard
	// library's image and image/draw packages.

	// ReadFrames converts that part of the buffer's data in the range [offset
	// : offset + n] to float32 samples in dst[:n], and returns n, the minimum
	// of length and the number of samples that dst can hold.
	//
	// offset, length and n count frames, not samples (slice elements). For
	// example, stereo audio might have two samples per frame. To convert
	// between a frame count and a sample count, multiply or divide by
	// Format().SamplesPerFrame().
	//
	// The offset is relative to the start of the buffer, which is not
	// necessarily the start of any underlying audio clip.
	//
	// The n returned is analogous to the built-in copy function, where
	// copy(dst, src) returns the minimum of len(dst) and len(src), except that
	// the methods here count frames, not samples (slice elements).
	//
	// Unlike the io.Reader interface, ReadFrames should read (i.e. convert) as
	// many frames as possible, rather than returning short. The conversion
	// presumably does not require any further I/O.
	//
	// TODO: make this return (int, error) instead of int, and split this into
	// audio.Reader and audio.Writer interfaces, analogous to io.Reader and
	// io.Writer, so that you could write "mp3.Decoder(anIOReader)" to get an
	// audio.Reader?
	ReadFrames(dst []float32, offset, length int) (n int)

	// WriteFrames is like ReadFrames except that it converts from src to this
	// Buffer, instead of converting from this Buffer to dst.
	WriteFrames(src []float32, offset, length int) (n int)
}

type BufferI16 struct {
	Fmt  Format
	Data []int16
}

type BufferF32 struct {
	Fmt  Format
	Data []float32
}

Have Buffer be a concrete type, not an interface type:

type Buffer struct {
	Format Format

	DataType DataType

	// The DataType field selects which slice field to use.
	U8  []uint8
	I16 []int16
	F32 []float32
	F64 []float64
}

type DataType uint8

const (
	DataTypeUnknown DataType = iota
	DataTypeU8_U8
	DataTypeU8_I16BE
	DataTypeU8_I16LE
	DataTypeU8_F32BE
	DataTypeU8_F32LE
	DataTypeI16
	DataTypeF32
	DataTypeF64
)

nigeltao avatar Jan 05 '17 03:01 nigeltao

In addition, here is another comment from @nigeltao about the math library:

As for a math32 library, I'm not sure if it's necessary. It's slow to call (64-bit) math.Sin inside your inner loop. Instead, I'd expect to pre-compute a global sine table, such as "var sineTable = [4096]float32{ etc }". Compute that table at "go generate" time, and you don't need the math package (or a math32 package) at run time.

I really like this idea which can also apply to log. It might come at an extra memory cost but I am personally OK with that.

Let try to summarize the pros and cons of those different approaches and let's discuss what we value and the direction we want to take. I am now convinced that my initial proposal, while fitting my needs, doesn't work well in other scenarios and shouldn't be left as is.

mattetti avatar Jan 05 '17 03:01 mattetti

A broader point, re the proposal to add packages to the Go standard library or under golang.org/x, is that I think it is too early to say what the 'right' API should be just by looking at an interface definition. As rsc said on https://github.com/golang/go/issues/18497#issuecomment-270387898: "The right way to start is to create a package somewhere else (github.com/go-audio is great) and get people to use it. Once you have experience with the API being good, then it might make sense to promote to a subrepo or eventually the standard library (the same basic path context followed)." Emphasis added.

The right way might actually involve letting a hundred API flowers bloom, and trying a few different APIs before making a push for any particular flower.

I'd certainly like to see more experience with how audio codecs fit into any API proposal: how does the Buffer type (whatever it is) interact with sources (which can block on I/O, e.g. playing an mp3 stream over the network) and sinks (which you don't want to glitch)?

WAV and AIFF are a good start, but handling some sort of compressed audio would be even better. A full-blown mp3 decoder is a lot of work, but as far as kicking API tyres, it might suffice to write a decoder for a toy audio codec where "c3d1e3c1e2c2e4" decoded to "play a C sine wave for 3 seconds, D for 1 second, E for 3 seconds, etc", i.e. to play a really ugly version of "doe a deer".

nigeltao avatar Jan 05 '17 03:01 nigeltao

Back on API design brainstorming and codecs, there might be some more inspiration in the golang.org/x/text/encoding/... and golang.org/x/text/transform packages, which let you e.g. convert between character encodings like Shift JIS, Windows 1252 and UTF-8.

Text encodings are far simpler than audio codecs, though, so it might not end up being relevant.

nigeltao avatar Jan 05 '17 03:01 nigeltao

Some more API inspiration, from C++:

https://www.juce.com/doc/classAudioBuffer https://www.juce.com/doc/classAudioProcessor

JUCE is one of the most-used audio processing libraries out there.

kisielk avatar Jan 05 '17 03:01 kisielk

Obviously the API isn't very go-like since it's C++ (and has a fair amount of pre-C++11 legacy, though is gradually being modernized) but it's worth taking a look at how they put things together.

kisielk avatar Jan 05 '17 03:01 kisielk

JUCE uses overloading quite heavily and as mentioned isn't very go-like (it's also a framework more than a suite of library, but it is well written and very popular). My hope is that we can come up with a more modern and accessible API instead of "port", I would really want audio in Go to be much easier for new developers. On a side note, I did port over some part of JUCE such as https://www.juce.com/doc/classValueTree for better interop with audio plugins.

mattetti avatar Jan 05 '17 04:01 mattetti

I'm not suggesting porting it, but I think the concepts in the library are pretty well thought out and cover most of what you would want to do with audio processing. It's worth getting familiar with. I don't think the use of overloading really matters, it's pretty easy to do that in other ways with Go.

kisielk avatar Jan 05 '17 04:01 kisielk

@nigeltao I agree with rsc and to be honest my goal was more to get momentum than to get the proposal accepted. I'm very happy to have found a group of motivated people who are interested in tackling the same issue.

I'll open a couple issues to discuss code styling and "core values" of this project.

mattetti avatar Jan 05 '17 05:01 mattetti

@nigeltao I think my design would also benefit from a Stream/Seeker (or similar) interface, but I'm not sure what the right approach is. I will try to implement some basic "remote-streaming", to find out what is essential. I have a feeling that it could fit together with Buffer32 nicely.

egonelbre avatar Jan 05 '17 06:01 egonelbre