one-voip-godot-4 icon indicating copy to clipboard operation
one-voip-godot-4 copied to clipboard

Allow jitter buffer to be implemented in GDScript

Open goatchurchprime opened this issue 10 months ago • 2 comments

I think the current use of a jitter buffer obfuscates what should be a fairly transparent process that would allow for synchronicity with animations, events and visemes.

Capturing: If you expose the VOIPInputCapture::_sample_buf_to_packet() function to GDScript then we can use a base AudioEffectCapture object and feed the VOIPInput object that wraps the Opus library like so:

while voipinputcapture.get_frames_available() >= 441:
    var samples = voipinputcapture.get_buffer(441)
    var opuspacket = voipinputcapture._sample_buf_to_packet(samples)
    transmit(opuspacket) # or packets.append(opuspacket)

For a bonus, _sample_buf_to_packet() could Repacketize multiple opus frames into a single opus packet when the length of samples is a multiple of 441. (By the way, 441 audio frames is 10ms of sound at 44100Hz, which is resampled up to 480 audio frames at 48000Hz required by the Opus library.)

The reason the current design of send_test_packets -> emit signal packet_ready is unsatisfying is that we lose track of which packet corresponds to which time window since it's going out to a different callback function instead of returning to the caller.

The output Opus stream stutters for the very simple reason that there is a mismatch between _process() frame rate (60fps) and audio encoding rate (100fps).

Playback

I exposed a copy of the push_packet() function that returned the uncompressed samples: PackedVector2Array AudioStreamVOIP::spush_packet(const PackedByteArray& packet)

The GDScript code for managing and playing the incoming stream is:

setup:

var staticvoipaudiostream = ClassDB.instantiate("AudioStreamVOIP")
$AudioStreamPlayer.stream = ClassDB.instantiate("AudioStreamGenerator")
$AudioStreamPlayer.play()
var playbackthing = $AudioStreamPlayer.get_stream_playback()


processing:

opuspacketsbuffer = [ ... ]  # filled from the networkthat
while playbackthing.get_frames_available() > 441 and len(opuspacketsbuffer):
    var frames = staticvoipaudiostream.spush_packet(opuspacketsbuffer.pop(0))
    playbackthing.push_buffer(frames)

Ther docs warn that AudioStreamGeneratorPlayback.push_buffer() is slow in GDScript, so the implementation of AudioStreamVOIP is probably good, if we also have the a get_opus_frames_available() to tell us how many empty slots are left in the jitter buffer.

It would also be a good idea to know how many frames are left in the buffer so we don't get the "NOT ENOUGH SAMPLES - frames:" error. However I can't see a good function in the Godot libraries to base it on, other than float _get_playback_position().

Additionally, features like Forward Error Encoding (where it can fill in for missing opus packets) have to be managed outside of this library since the opus packets don't contain sequence numbers, so you have to tell the library when a packet is missing.

There's also a DTX (Discontinuous Transmission) feature that puts out 400ms long frames when there is silence. Otherwise the Opus library assumes that everyone is talking all the time everywhere like it's a video conference where the only purpose of being online is to talk. This is not how we play networked games, where there something other than just talking as an activity, and a VOX (Voice operated push to talk) system would be more appropriate, as well as being kinder on the bandwidth, and considerably more scalable.

goatchurchprime avatar Apr 13 '24 14:04 goatchurchprime