MWEngine icon indicating copy to clipboard operation
MWEngine copied to clipboard

while recording with SampleEvent and inputChannel recorded output get mismatched

Open YogarajRamesh opened this issue 2 years ago • 14 comments

Hi @igorski,

I am trying to record voice with music. I use SampleEvent to load the music and start recording. while recording the latency is delayed by few milliseconds, which cause voice mismatch with music(voice delayed with music). it happens in few devices. So what I tried was using new Function from latest repo to mute the latency. MWEngineInstance.getInputChannel().setMuted( true ); still the audio is delayed over music in output recording. Is there any solution to sync the mismatch voice and music.

Attachment Details.

SampleEvent = audio of counting number(as music). inputChannel = counting number sync with SampleEvent(music).

please check the attachment.

https://mega.nz/file/kN8DxQ4B#4zyCzDCpTvRzK9mB5AI9x7QouMUDv2nnJulzAdOGq7k

Thank you in advance @igorski

YogarajRamesh avatar Jun 14 '22 11:06 YogarajRamesh

That's an interesting conundrum.

So basically the audio that comes from the input/microphone is what we is perceived like audio from the past (due to the latency on certain devices). The audio that is synthesized internally is what we can consider realtime and together we have a mismatch in timing.

When the engine mixes in the input signal with the internally generated audio, the input is obviously lagging behind in time (your recording sounded quite extreme, but such is the nature of the fragmented Android ecosystem, where performance can be considerably different, so its something to consider).

MWEngine can calculate the latency (as the AAudio driver provides such a facility), but now I'm thinking how it should correct the "position" of the recorded input signal when mixing it with the internally generated audio. The calculation to do so is quite simple (basically we "align" the input and internal audio by pushing the internal audio recording forwards by the latency), but this implies that we will need to be using twice the memory during recording as we need to keep the recorded input and internal audio separate until the final mixing stage... which is quite a penalty.

ALTERNATIVELY the recording of the internal audio will be "pushed forward" (thus delayed to sync with the input signal) for the duration of the input latency. This however prevents a problem when recording starts during audio playback (as in: the user can decide on the spot while audio is playing whether they want to record or not). Is that the case for your application? Or will your application have a single button that activates microphone recording and starts the internal audio playback at the same time ?

igorski avatar Jun 22 '22 17:06 igorski

Hi @igorski Thanks for your response My Application is Single button based(it start both recording and internal audio playback). In that case how can we push the internal audio based on the latency calculation?

Thanks in advance

YogarajRamesh avatar Jun 23 '22 06:06 YogarajRamesh

Hi @igorski, Any suggestions on this?

Thanks in advance

YogarajRamesh avatar Jun 30 '22 05:06 YogarajRamesh

Hi @igorski

Hope your are doing great. First of all thank you for Time you put into this. By any chance is there any updates?

Thanks in advance

YogarajRamesh avatar Jul 20 '22 13:07 YogarajRamesh

Hi there @YogarajRamesh

I'm afraid this is a notoriously difficult thing to address. The most accurate way to calculate device latency is to use a loopback device which isn't really something you can expect your users to have and use (also there will be requirements on how they should conduct this test which needs to be done very carefully)... sadly the amount of devices running Android encompass such a wide range of configurations that it's not really feasible to "guesstimate" an appropriate latency (which is quite the luxury for iPhone users as each iPhone of a specific version is the exact same device).

In the latest commit, I have added a "warmup" phase that aims to synchronize the input and output streams as best as possible to minimize latency and force the input stream to be operating in low latency mode, but the results will sadly be device dependent. I have seen improvements with a low range device though I'm afraid this is all that can be done. There must be a reason why Smule has several patents to their name :/

igorski avatar Jul 24 '22 09:07 igorski

Hi @igorski , Thanks for you reply, I tried the latest build still facing the same issues. Is there any other to work around this issues? By adding manual slider in UI to adjust the latency by user?

Thanks in advance

YogarajRamesh avatar Jul 26 '22 07:07 YogarajRamesh

Hi @igorski,

Any suggestions on this??

Thanks in advance

YogarajRamesh avatar Aug 09 '22 06:08 YogarajRamesh

Hi @YogarajRamesh

This is a tough cookie to crack (is that even a saying ?). Anyways I have created a branch duplex (also see this pull request) which you can give a spin. You can build the library for use in your app or give the updated example Activity within that branch a go basically:

There are new recording methods:

MWEngineInstance.startFullDuplexRecording( float roundtripLatencyInMs, String outputFileName );
MWEngineInstance.stopFullDuplexRecording();

Where roundtripLatencyInMs is a floating point value describing the latency between speaking into the microphone and hearing the audio back over the speaker, in milliseconds (so you can enter a value like 400 in case you measure the latency to be around 0.4 seconds). outputFileName speaks for itself as it is similar to all other recording methods.

When the "full duplex" recording is started, the engine:

  • will record the engine output (e.g. all synthesized sounds, playback of sequenced events)
  • will record the device input
  • will mute the input recording (to prevent feedback)
  • will record the output and input streams separately

Once recording is stopped, the engine will take the output and input streams and align them with the latency you provided when recording started. So if the latency was specified as 400 ms, the input stream is mixed into the output buffer 400 ms earlier, hopefully aligning a vocal performance with the sequenced output.

For the test app its good to know that you must drag the latency slider before starting the recording (as the latency is provided to the record method when it starts). For your actual app you can follow an approach that you would see in Rock Bands calibration menu, basically present the user with a simple activity where they themselves can tune the latency and hear the result (like "please sing in time with the following drum sequence"), or something to that effect.

I still need to do some backward compatibility checks with all other recording methods before merging the branch, but you can test if this can solve your problem.

igorski avatar Aug 31 '22 20:08 igorski

Hi @igorski Thank you so much

It’s really working great as expected. I have tested in multiple low end phones and hight end phones by using the slider the all the outputs are almost same. I have one question. " Once recording is stopped, the engine will take the output and input streams and align them with the latency you provided when recording started. So if the latency was specified as 400 ms, the input stream is mixed into the output buffer 400 ms earlier, hopefully aligning a vocal performance with the sequenced output. " Since the align phase take place after the recording is stopped. Can we set the latency delay value by checking the output I.e, recorded audio over music like preview ?. then align it.

Thanks in advance

YogarajRamesh avatar Sep 03 '22 07:09 YogarajRamesh

Hi @YogarajRamesh

Since the align phase take place after the recording is stopped. Can we set the latency delay value by checking the output I.e, recorded audio over music like preview

If I understand correctly: when recording is stopped, you want to preview both the recorded music as the recorded input side by side so the user can drag the "position of the input recording" to align with the output ?

Well, that's going to present a few challenges as it means we need to allocate memory for potentially two large recordings (depending how long the performance lasts), as MWEngine doesn't stream audio from storage (as in the context of a live audio processing runtime its too much of a performance bottleneck).

What I'm thinking is that your application can do a one-time setup upon first install. So user needs to sing along to a very short clip (maybe a four bar loop of a constant drum pulse where they say a short word to align with the timing of the pulse) and match their recorded input with the sequenced drum. The setup for that recording being:

  • present start button, upon click the sequencer starts playback while at the same you invoke startInputRecording (so we are only recording the device input, not full duplex with the output)
  • track the progress for four bars using the Notifier mechanism also seen in the example Activity (once four bars have elapsed, you will be stopping both the sequencer and invoking stopInputRecording to stop recording and save it to storage)
  • now load the written input recording from storage, and load it into the SampleManager
  • present new UI where you have a "play" button which at the same time starts the sequencer from the beginning, but also plays a SampleEvent (where the sample is the input recording) at 0 samples offset
  • UI also has a slider where you can adjust the offset of the input recording (you want the sliders value to generate a number in milliseconds)
  • this value in milliseconds you convert to an amount in samples (using BufferUtility for the conversion) and set it as the startOffset for the SampleEvent playing back the recording (note that when its supplied as startOffset you must pass the number as a negative value as we are pulling its playback "forward" (e.g. it starts at an earlier point in time).
  • when user is satisfied with the alignment they click a save button and the value in milliseconds is stored in your device storage (which you can provide to the new startFullDuplexRecording() method when recording begins for a performance).
  • this setup screen is not shown again (unless user decides to reconfigure it through a settings menu or something)

So this becomes a one time setup where all subsequent sessions for the user require no further setup (as the latency will not change for their device). This avoids making this a repetitive task upon each performance (also it will be more honest as people tend to overestimate their timing skills and will have differences in accuracy between performances).

igorski avatar Sep 06 '22 18:09 igorski

Hi @igorski

Thank you so much. I am also trying to use the same approach to handle it. Once again thank you for your suggestion. And is this startFullDuplexRecording() available in main branch?

Thanks in advance

YogarajRamesh avatar Sep 08 '22 04:09 YogarajRamesh

Hi @YogarajRamesh

I finished all tests (needed to ensure all existing recording methods remained working as before) and have merged the code into the main branch of the repository.

For reference, all we discussed with additional input has been added to the documentation on recording.

igorski avatar Sep 11 '22 09:09 igorski

Hi @igorski

Thank you so much

YogarajRamesh avatar Sep 12 '22 11:09 YogarajRamesh

Hi @igorski

Is there way to manually turn off/on the feedback? while using startFullDuplexRecording()

  • will mute the input recording (to prevent feedback)

Thanks in advance

YogarajRamesh avatar Sep 23 '22 14:09 YogarajRamesh