DSP.jl icon indicating copy to clipboard operation
DSP.jl copied to clipboard

Initial istft implementation.

Open jfsantos opened this issue 10 years ago • 32 comments

I wrote an istft function to bring a real signal from its per-frame STFT representation to a time-domain signal by using overlap-add. This version is unoptimized as I am having some trouble to create an FFTW.Plan for a backward RFFT. I can add my non-working code to this branch in case you want to take a look.

I understand the PR is in a crude shape right now but I need some help moving forward, and I figured it would be better to submit a PR than try to solve this via e-mail.

Any suggestions for improvement or tips on how I can use the FFTW.Plan interface to do what I want here?

jfsantos avatar Nov 27 '14 17:11 jfsantos

Coverage Status

Coverage remained the same when pulling a29dbbb34f4f17c52864958bd57d388c71c0e8e5 on jfs/istft into 4019eac4c918ae2b1027cc69d799ed187992b111 on master.

coveralls avatar Nov 27 '14 17:11 coveralls

Here is a working example

backward_plan{T<:Union(Float32, Float64)}(X::AbstractArray{Complex{T}}, Y::AbstractArray{T}) =
    FFTW.Plan(X, Y, 1, FFTW.ESTIMATE, FFTW.NO_TIMELIMIT).plan

x=randn(8)
y=rfft(x)  # works also for fft(x)
tmp1=similar(y)
tmp2=similar(x)

plan = backward_plan(tmp1,tmp2)

copy!(tmp1,y)
copy!(tmp2,x)
FFTW.execute(plan, tmp1, tmp2)  # warning: destroys tmp1
scale!(tmp2, FFTW.normalization(tmp2))

I suggest you do a copy to tmp1 from S instead of S[:,k]. Also probably better:

for 1:length(wsum)
    @inbounds wsum[i] != 0 && (out[i] /= wsum[i])
end
# instead of
pos = wsum .!= 0
out[pos] ./= wsum[pos]

gummif avatar Nov 27 '14 18:11 gummif

Thanks! How did you implement backward_plan? I tried the following implementation:

backward_plan{T<:Union(Float32, Float64)}(X::AbstractArray{Complex{T}}, Y::AbstractArray{T}) =
    FFTW.Plan(Y, X, 1, FFTW.BACKWARD, FFTW.ESTIMATE, FFTW.NO_TIMELIMIT).plan

but then if I try to call it I get:

x=randn(8)
y=rfft(x)
tmp1=similar(y)
tmp2=similar(x)

plan = backward_plan(tmp2,tmp1)

ERROR: `Plan{T<:Union(Float64,Float32,Complex{Float32},Complex{Float64})}` has no method matching Plan{T<:Union(Float64,Float32,Complex{Float32},Complex{Float64})}(::Array{Float64,1}, ::Array{Complex{Float64},1}, ::Int64, ::Int32, ::Uint32, ::Float64)
 in backward_plan at none:1

EDIT: I see you posted your backward plan now. So you do not need to use FFTW.BACKWARD in the arguments?

jfsantos avatar Nov 27 '14 19:11 jfsantos

The plan is exactly how it is done in the FFTW module, so FFTW.BACKWARD is apparently not needed.

gummif avatar Nov 27 '14 19:11 gummif

Right, it seems to work perfectly. I'll just test this with different windows and lengths and add tests and documentation, but the function seems to be working well now. Thanks a lot!

jfsantos avatar Nov 27 '14 20:11 jfsantos

To avoid allocation (if you have not not so already):

copy!(tmp1, 1, S, 1+(k-1)*size(S,1), length(tmp1))

gummif avatar Nov 27 '14 20:11 gummif

That's exactly what I did in my last commit. However, execution time and allocation seem to be the same for both implementations, at least for the size of slices I'm using (length 128).

jfsantos avatar Nov 27 '14 20:11 jfsantos

I mean for the copy!(tmp1, S[:,k]) line. There is also allocation for tmp2.*win which could be done with a loop.

gummif avatar Nov 27 '14 20:11 gummif

You are right. I fixed both lines and memory allocation was reduced to 75% of the previous implementation. Execution times are more or less the same (again, maybe because I am using small window sizes).

jfsantos avatar Nov 27 '14 20:11 jfsantos

OK good. A small bug:

1+(k-1)*winc:((k-1)*winc+n
# should be
ix + n
# with 
ix = (k-1)*winc

gummif avatar Nov 27 '14 20:11 gummif

Using PRESERVE_INPUT flag in FFTW.Plan and removing copy! in istft roop make much faster. So the backward plan would be:

backward_plan{T<:Union(Float32, Float64)}(X::AbstractArray{Complex{T}}, Y::AbstractArray{T}) =
    FFTW.Plan(X, Y, 1, FFTW.ESTIMATE | FFTW.PRESERVE_INPUT, FFTW.NO_TIMELIMIT).plan

r9y9 avatar Nov 28 '14 02:11 r9y9

@r9y9 That's a good point. It would however require a bit of pointer hacking like

tmp1 = pointer_to_array(pointer(S, 1+(k-1)*size(S,1)), size(S,1), false)
# replaces
copy!(tmp1, 1, S, 1+(k-1)*size(S,1), length(tmp1))

gummif avatar Nov 28 '14 13:11 gummif

@gummif Thanks for your comment. I was thinking about replacing:

copy!(tmp1, 1, S, 1+(k-1)*size(S,1), length(tmp1))
FFTW.execute(p, tmp1, tmp2)

with

FFTW.execute(p, sub(S,1:size(S,1), k), tmp2)

Though this requires slight allocations in sub, I've confirmed that my implementation works faster for relatively short signals. Unfortunately, it may be slower for long signals (depends on window length, overlap and window type, etc). Below is the testing code:

using DSP, Base.Test

wlen = 2048
overlap = div(wlen, 2)

x1 = rand(2^12)
println("testing with short signal (length: $(length(x1)))")
X1 = stft(x1, wlen, overlap)

# force JIT-compilation
istft(X1, wlen, overlap)
istft2(X1, wlen, overlap)

@time y1 = istft(X1, wlen, overlap)
@time y2 = istft2(X1, wlen, overlap)
@test_approx_eq x1 y1
@test_approx_eq y1 y2

x1 = rand(2^22)
println("testing with long signal (length: $(length(x1)))")

X1 = stft(x1, wlen, overlap)
@time y1 = istft(X1, wlen, overlap)
@time y2 = istft2(X1, wlen, overlap)
@test_approx_eq x1 y1
@test_approx_eq y1 y2

The results using julia v0.3.2:

testing with short signal (length: 4096)
elapsed time: 0.000865683 seconds (131456 bytes allocated)
elapsed time: 0.000283067 seconds (102768 bytes allocated)
testing with long signal (length: 4194304)
elapsed time: 0.106112303 seconds (67160936 bytes allocated)
elapsed time: 0.167759533 seconds (69568720 bytes allocated)

complete code for istft2: https://gist.github.com/r9y9/93d1645275f7bea9569a

I agree the current implementation. I think we can add benchmarks and improve performance then.

r9y9 avatar Nov 28 '14 14:11 r9y9

I tested both the suggested version with the pointer to array and the one with copy. Surprisingly, the one using copy uses slightly less memory (maybe because a new pointer is allocated at each iteration?). I only averaged over 100 executions but the version using copy was also slightly faster.

Since the @time macro measures the time for a single execution of the function, it may not be accurate. For this reason, I usually run the same expression 100/1000 times and average the result of the @elapsed expression instead:

t = zeros(100)
for i=1:100
t[i] = @elapsed y1 = istft(X1, wlen, overlap)
end

When trying both @r9y9 and the pointer_to_array approaches, I get segfaults from a FFTW function call on my machine:

signal (11): Segmentation fault: 11
r2cb_16 at /usr/local/Cellar/fftw/3.3.4/lib/libfftw3.3.dylib (unknown line)
apply at /usr/local/Cellar/fftw/3.3.4/lib/libfftw3.3.dylib (unknown line)
apply at /usr/local/Cellar/fftw/3.3.4/lib/libfftw3.3.dylib (unknown line)
apply_dif_dft at /usr/local/Cellar/fftw/3.3.4/lib/libfftw3.3.dylib (unknown line)
apply_hc2r at /usr/local/Cellar/fftw/3.3.4/lib/libfftw3.3.dylib (unknown line)
istft2 at /Users/jfsantos/.julia/v0.3/DSP/src/util.jl:150
julia_istft2;20180 at  (unknown line)
jlcall_istft2;20180 at  (unknown line)
jl_apply at /private/tmp/julia-MRinr5/src/./julia.h:980
jl_apply at /private/tmp/julia-MRinr5/src/interpreter.c:59
eval at /private/tmp/julia-MRinr5/src/interpreter.c:207
eval at /private/tmp/julia-MRinr5/src/interpreter.c:215
eval_body at /private/tmp/julia-MRinr5/src/interpreter.c:541
jl_interpret_toplevel_thunk_with at /private/tmp/julia-MRinr5/src/interpreter.c:569
jl_toplevel_eval_flex at /private/tmp/julia-MRinr5/src/toplevel.c:511
jl_f_top_eval at /private/tmp/julia-MRinr5/src/builtins.c:399
eval_user_input at REPL.jl:54
jlcall_eval_user_input;19917 at  (unknown line)
jl_apply at /private/tmp/julia-MRinr5/src/./julia.h:980
anonymous at task.jl:96
jl_apply at /private/tmp/julia-MRinr5/src/task.c:427
julia_trampoline at /private/tmp/julia-MRinr5/src/init.c:1007
Segmentation fault: 11

With the istft2 implementation, it segfaults on the first call. Using pointer_to_array, it segfaults after some calls. I am not sure about what's causing this behavior.

jfsantos avatar Nov 28 '14 15:11 jfsantos

Oh, I had the same segfaults. Could you modify backward_plan as follows and try again?

backward_plan2{T<:Union(Float32, Float64)}(X::AbstractArray{Complex{T}}, Y::AbstractArray{T}) =
    FFTW.Plan(X, Y, 1, FFTW.ESTIMATE | FFTW.PRESERVE_INPUT, FFTW.NO_TIMELIMIT).plan

to

backward_plan2{T<:Union(Float32, Float64)}(X::AbstractArray{Complex{T}}, Y::AbstractArray{T}) =
    FFTW.Plan(X, Y, 1, FFTW.ESTIMATE | FFTW.PRESERVE_INPUT, FFTW.NO_TIMELIMIT)

and also

FFTW.execute(p, tmp1, tmp2)

to

FFTW.execute(p.plan, tmp1, tmp2)

I am not sure why segfaults happen but this changes seems to fix the segfaults in my environment. It's weird..

r9y9 avatar Nov 28 '14 15:11 r9y9

I'm not able to produce a significant difference. So the version with copy might be more safe.

julia> @time for i = 1:nn
           y2 = istft2(X1, wlen, overlap)
       end
elapsed time: 0.148178763 seconds (110615752 bytes allocated, 34.14% gc time)

julia> @time for i = 1:nn
           y1 = istft(X1, wlen, overlap)
       end
elapsed time: 0.159816237 seconds (110062040 bytes allocated, 46.83% gc time)

julia> @time for i = 1:nn
           y2 = istft2(X1, wlen, overlap)
       end
elapsed time: 0.148254777 seconds (110615736 bytes allocated, 32.35% gc time)

julia> @time for i = 1:nn
           y1 = istft(X1, wlen, overlap)
       end
elapsed time: 0.13468912 seconds (110061368 bytes allocated, 37.36% gc time)

gummif avatar Nov 28 '14 16:11 gummif

Sorry for my insufficient benchmarks. I agree with @gummif

r9y9 avatar Nov 29 '14 02:11 r9y9

This is a late response, but the segfaults may come from trying to apply a plan that expects an array with a certain alignment to an array with a different alignment, which would be the case for a SubArray with an odd offset. You can use the UNALIGNED flag to tell FFTW not to assume aligned input, but the last time I looked at this, it seemed like an unaligned FFT was measurably slower than an aligned FFT, but like @gummif I found no substantial overhead to using copy!.

simonster avatar Dec 01 '14 16:12 simonster

Alright, so there is no reason not to use copy! then. I just need to add some more tests and docs to this.

jfsantos avatar Dec 08 '14 21:12 jfsantos

Oh, it seems I did something terrible while rebasing. I thought I had squashed everything instead of making this mess. Any tips on how to fix this?

jfsantos avatar Apr 15 '15 15:04 jfsantos

Coverage Status

Coverage decreased (-0.35%) to 97.06% when pulling e5a0d017b098fde6812420785fc86ffc27165772 on jfs/istft into bcca68b5ece29e4cfb6906209d00981cabcbfcd9 on master.

coveralls avatar Apr 15 '15 15:04 coveralls

Coverage Status

Coverage increased (+0.07%) to 97.48% when pulling 3349186429d8758d9cac7e52c55851b01e5c94f9 on jfs/istft into bcca68b5ece29e4cfb6906209d00981cabcbfcd9 on master.

coveralls avatar Apr 23 '15 12:04 coveralls

Not entirely sure what's going on with the history here, but the diff looks right. I guess the nuclear option is:

git diff master > patch.diff
git reset --hard master
patch -p1 < patch.diff
rm patch.diff

and then commit and force-push to this branch.

simonster avatar Apr 23 '15 14:04 simonster

This should do the trick, thanks @simonster! Let's wait until the tests pass and then finally merge this?

jfsantos avatar Jun 06 '15 00:06 jfsantos

Any more comments on this? @simonster @gummif @r9y9

jfsantos avatar Jul 02 '15 22:07 jfsantos

What ever happened to this? This came up in a discourse thread re: istft in DSP.jl: https://discourse.julialang.org/t/should-there-be-an-inverse-stft-in-dsp-jl/6661/2

standarddeviant avatar Oct 31 '17 18:10 standarddeviant

@standarddeviant I ended up never being able to finish this PR and in the meantime, Julia changed a lot and I was not using it on a daily basis anymore. Please feel free to do whatever is needed to get this to work with newer versions of Julia and submit another PR.

jfsantos avatar Oct 31 '17 18:10 jfsantos

I started to write an istft as well, today, until I found out this PR and the MusicProcessing reference mentioned above. The latter turned out to have an alternative implementation of MFCC as well.

It would be useful, indeed, to have this istft functionality added to DSP.

While implementing my own little istft, I noticed that DSP.arraysplit() seems to only keep "whole" segments, dropping the last samples of the source array. If we would strive for perfect reconstruction possibility with istft, arraysplit should probably have an option to augment the input.

While we're at it, it would also be great to have an option to arraysplit in such a way that the overlap is symmetric on the left and right of each frame, even for the first and last frame. It may be best to illustrate this with an example. Suppose I have 1000 samples, and I want frame-shifts of 100 samples and an analysis window of 300 samples. Then the first frame would be samples -100..200, the second frame would be sampes 0..300, etc until the last frame 800..1100 (give or take a Julia base 1 index offset). Negative and >size indexes would refer to augmented data, e.g., zeros.

The advantage would be that such a scheme would have 1000/100 = 10 frames, so that the frames are more easy to align with the original samples (first frame aligned with 0..100, the middle of the analysis window, etc.). The analysis would be symmetric w.r.t. start and end, of the signals, and of the frames themselves.

In Kaldi a modern speech recognition toolkit, this is known as the analysis option snip-edges=false.

davidavdav avatar Jan 19 '18 13:01 davidavdav

It seems a little bit unnatural to me to circularly wrap a signal by default. When I apply stfts to data, its usually because it's non-stationary data, and the first and the last frames end up being quite different from each other.

@davidavdav, would you have any interest in continuing this PR?

galenlynch avatar Oct 28 '18 14:10 galenlynch

gist here Hi, I'd like to help to get the istft implemented in DSP. I am trying to bring back MusicProcessing and I need a working istft and I see others have shown interest in having this in the DSP library. I am trying to familiarise myself with the DSP.jl code and I am sharing the code inspired on this PR and made some changes so it plays nicely with current DSP.jl code. I tried to follow the same approach but I am sure I am missing several details.. help please? :-)

Right now it appears to work ok using a hanning window but wrongly scaled at the beginning when window is "nothing". I am not sure how to properly address this, perhaps the scaling I see in older code?

@test x1 ≈ Y1

fails even with hanning window, I believe because of discrepancies at the the first sample. Once we have the algorithm working I'd be happy to work on the tests, documentation and do a proper PR but happy to hear your thoughts

bafonso avatar Aug 17 '20 13:08 bafonso