basic-pitch icon indicating copy to clipboard operation
basic-pitch copied to clipboard

Add audio slice prediction function

Open G-haoyu opened this issue 2 years ago • 2 comments

When using basic-pitch for audio to midi, for longer audio, the prediction consumes more memory resources, which may cause Tensorflow to kill the predict process. By slicing the audio and splitting the prediction into slices, and then combining them into a complete prediction result, memory resource consumption is greatly reduced. It is more suitable for basic-pitch to be promoted and used.

G-haoyu avatar Jun 26 '22 05:06 G-haoyu

Hi @G-haoyu

Good idea, and similar to what we do in the typescript version, but I think there are some changes to this implementation that would better fix the issue you are trying to solve

  1. You could switch the model call to use predict which may solve your problem entirely. predict runs input in batches through a keras model.
  2. Instead of loading the entire audio at once with librosa.load, you could switch to librosa.stream which will not load the entire file all at once (a possible concern for files on the scale of hours)
  3. AUDIO_SLICE_TIME might be better off as AUDIO_SLICE_FRAMES since going between seconds and samples can be a little dangerous at times (floating point errors) and the frame size is known.
  4. You are opening the file in append mode and that makes sense, but you also probably want to open the file once in write mode before the looping begins to clear out any existing debug files. We never check if debug_file exists (unlike all the other outputs) since it's unlikely to be used for anything important so it's possible the file may already exist.

drubinstein avatar Jun 27 '22 16:06 drubinstein

Any updates on this?

schneiderfelipe avatar May 07 '23 19:05 schneiderfelipe