vmsg icon indicating copy to clipboard operation
vmsg copied to clipboard

Streaming encoder

Open onel opened this issue 5 years ago • 6 comments

First of all, thanks for this great library.

I have a question: is there a way to do encoding of a specific audio buffer and only get that back, and not the whole recording? For example, sending a Float32Array, vmsg encodes it and then sends it back. Right now I think during a recording, everything is held in memory and returned when calling vmsg_flush(). This would be useful for longer recordings where you want to encode something and maybe upload it and not keep it in memory.

I've tried to do something similar, by calling vmsg_init, vmsg_encode and then vmsg_flush, inside the data event listener for the worker. I don't think this is the right way to do it.

  case "data":

    if (!vmsg_init(msg.rate)) return postMessage({type: "error", data: "vmsg_init"});

    if (!vmsg_encode(msg.data)) return postMessage({type: "error", data: "vmsg_encode"});

    const blob = vmsg_flush();
    if (!blob) {
      return postMessage({type: "error", data: "vmsg_flush"});
    }

    postMessage({
      type: "blob",
      data: blob
    });
    
    break;

Is there a way to do that? A change would also need to be made inside vmsg.c, right? Thanks

onel avatar Jan 08 '19 15:01 onel

Yes, it's possible, just need to make vmsg_encode C function return the number of bytes written, so you can send v->mp3+v->size-n .. v->mp3+v->size bytes via PostMessage to the main thread. At the end you also should fix the lame tag (lame_get_lametag_frame), need additional message for that.

I'm not sure if we want to use that method for normal recordings, because it would require to send every encoded chunk back to the main thread and copy it to the buffer, it might introduce additional delay. But should be ok to make it optional.

Kagami avatar Jan 08 '19 15:01 Kagami

Ok, I understand. Don't have experience with c but maybe I'll try that in a fork. Thank you so much for the details.

onel avatar Jan 09 '19 18:01 onel

Hi there, I took a stab at making this work and I wanted to check with you if this is the right way to do it. I haven't create a PR for this because I don't know if you would want to integrate it. But let me know if you would want that. The idea is that on each buffer we would do vmsg_encode, vmsg_flush and then a new method vmsg_reset. Inside the worker this would look like this:

  case "data":

    if (!vmsg_encode(msg.data)) return postMessage({type: "error", data: "vmsg_encode"});

    const blob = vmsg_flush();
    if (!blob) {
      return postMessage({type: "error", data: "vmsg_flush"});
    }

    postMessage({
      type: "blob",
      data: blob
    });

    FFI.vmsg_reset()
    
    break;

This will return the blob for that specific buffer each time.

The changes that I've made are: For vmsg_encode the size is returned each time:

WASM_EXPORT
int vmsg_encode(vmsg *v, int nsamples) {
  if (nsamples > MAX_SAMPLES)
    return -1;

  if (fix_mp3_size(v) < 0)
    return -1;

  uint8_t *buf = v->mp3 + v->size;
  int n = lame_encode_buffer_ieee_float(v->gfp, v->pcm_l, NULL, nsamples, buf, BUF_SIZE);

  if (n < 0)
    return n;

  v->size += n;
  return v->size;
}

And the new method:

WASM_EXPORT
int vmsg_reset(vmsg *v, int rate) {
  if (v) {
    lame_close(v->gfp);
    v->size = 0;

    v->gfp = lame_init();
    if (!v->gfp) {
      vmsg_free(v);
      return -1;
    }
    
    lame_set_mode(v->gfp, MONO);
    lame_set_num_channels(v->gfp, 1);
    lame_set_in_samplerate(v->gfp, rate);
    lame_set_VBR(v->gfp, vbr_default);
    lame_set_VBR_quality(v->gfp, 5);

   if (lame_init_params(v->gfp) < 0) {
	 vmsg_free(v);
	 return -1;
   }
    
  }

  return 0;
}

This basically looks like init but without the memory allocation. The problem I'm having is that the resulting mp3 blob is not actually usable. I think in vmsg_reset the encoder is not set up correctly. My questions are: Do you thing this is a good way to do buffer encoding? And, what would you recommend we don in vmsg_reset? Thanks

onel avatar Mar 15 '19 17:03 onel

@onel did you get it working ? i am also interested in this for live speech to text (on the server)

flieks avatar Jan 16 '20 10:01 flieks

Damn. I want this too. What if we fake it and just swap the encoder with a new one every few seconds? I'm fine with lots of relatively short mp3s.

stefan-reich avatar Jul 30 '21 23:07 stefan-reich

Ah I think I'll simply use MediaRecorder. It should record as .webm, right?

stefan-reich avatar Jul 30 '21 23:07 stefan-reich