basic-pitch
basic-pitch copied to clipboard
Add audio slice prediction function
When using basic-pitch for audio to midi, for longer audio, the prediction consumes more memory resources, which may cause Tensorflow to kill the predict process. By slicing the audio and splitting the prediction into slices, and then combining them into a complete prediction result, memory resource consumption is greatly reduced. It is more suitable for basic-pitch to be promoted and used.
Hi @G-haoyu
Good idea, and similar to what we do in the typescript version, but I think there are some changes to this implementation that would better fix the issue you are trying to solve
- You could switch the model call to use predict which may solve your problem entirely.
predict
runs input in batches through a keras model. - Instead of loading the entire audio at once with
librosa.load
, you could switch to librosa.stream which will not load the entire file all at once (a possible concern for files on the scale of hours) -
AUDIO_SLICE_TIME
might be better off asAUDIO_SLICE_FRAMES
since going between seconds and samples can be a little dangerous at times (floating point errors) and the frame size is known. - You are opening the file in append mode and that makes sense, but you also probably want to open the file once in write mode before the looping begins to clear out any existing debug files. We never check if
debug_file
exists (unlike all the other outputs) since it's unlikely to be used for anything important so it's possible the file may already exist.
Any updates on this?