whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

C-style API/Python bindings not working on Windows

Open Letorshillen opened this issue 1 year ago • 2 comments

The c-style api/python bindings are not working on windows. On Linux/MacOS/WSL the code snippets from issue #9 are working without any trouble. But on windows the adjusted code is just shut down as you can see in the comment below.

For some reason none of these solutions seem to work now on Windows (both 10 and 11). Every time on multiple machines the Python interpreter exits and shuts down without any error during the invocation of whisper_full. Does anyone have any ideas?

So far I've used both my cython extension and the various ctypes (using updated versions of the examples provided here). I've even made sure the compiler is MSVC, and it matches the version used by Python - I am using Python 3.10. With thecython extension I've tried linking against the dynamic library, static library and also tried just object files. Compiled both using the CMake scripts and an updated Makefile.

This only happens on Windows and the exact same code works fine on MacOS and Linux. Without windows compatibility this extension becomes useless for its original purpose. If anyone could help I would greatly appreciate it.

The only clue I've got is that; one time when using a (Debug) DLL created by VS 2020 (with a more recent compiler than python3.10!) with a solution from CMake, a complaint was given about a rv==0 assert presumably this is a threading issue possibly not using pthreads however all the default CMake options were used to create the solution file. Nothing was edited. Just the latest version was built straight. pthreads should have been used. This was using the latest ctypes solution mentioned. This hasn't been able to be reproduced though.

Originally posted by @o4dev in https://github.com/ggerganov/whisper.cpp/issues/9#issuecomment-1360272368

Here is the adjusted code for windows:

import ctypes
import pathlib

# this is needed to read the WAV file properly
from scipy.io import wavfile

libname = "libwhisper"
fname_model = "models/ggml-tiny.en.bin"
fname_wav = "samples/jfk.wav"

# this needs to match the C struct in whisper.h
class WhisperFullParams(ctypes.Structure):
    _fields_ = [
        ("strategy", ctypes.c_int),
        #
        ("n_max_text_ctx", ctypes.c_int),
        ("n_threads", ctypes.c_int),
        ("offset_ms", ctypes.c_int),
        ("duration_ms", ctypes.c_int),
        #
        ("translate", ctypes.c_bool),
        ("no_context", ctypes.c_bool),
        ("single_segment", ctypes.c_bool),
        ("print_special", ctypes.c_bool),
        ("print_progress", ctypes.c_bool),
        ("print_realtime", ctypes.c_bool),
        ("print_timestamps", ctypes.c_bool),
        #
        ("token_timestamps", ctypes.c_bool),
        ("thold_pt", ctypes.c_float),
        ("thold_ptsum", ctypes.c_float),
        ("max_len", ctypes.c_int),
        ("max_tokens", ctypes.c_int),
        #
        ("speed_up", ctypes.c_bool),
        ("audio_ctx", ctypes.c_int),
        #
        ("prompt_tokens", ctypes.c_void_p),
        ("prompt_n_tokens", ctypes.c_int),
        #
        ("language", ctypes.c_char_p),
        #
        ("suppress_blank", ctypes.c_bool),
        #
        ("temperature_inc", ctypes.c_float),
        ("entropy_thold", ctypes.c_float),
        ("logprob_thold", ctypes.c_float),
        ("no_speech_thold", ctypes.c_float),
        #
        ("greedy", ctypes.c_int * 1),
        ("beam_search", ctypes.c_int * 3),
        #
        ("new_segment_callback", ctypes.c_void_p),
        ("new_segment_callback_user_data", ctypes.c_void_p),
        #
        ("encoder_begin_callback", ctypes.c_void_p),
        ("encoder_begin_callback_user_data", ctypes.c_void_p),
    ]


if __name__ == "__main__":
    # load library and model
    libname = str(pathlib.Path().absolute() / libname)
    whisper = ctypes.WinDLL(libname, winmode=1)

    # tell Python what are the return types of the functions
    whisper.whisper_init_from_file.restype = ctypes.c_void_p
    whisper.whisper_full_default_params.restype = WhisperFullParams
    whisper.whisper_full_get_segment_text.restype = ctypes.c_char_p

    # initialize whisper.cpp context
    ctx = whisper.whisper_init_from_file(fname_model.encode("utf-8"))

    # get default whisper parameters and adjust as needed
    params = whisper.whisper_full_default_params()
    params.print_realtime = True
    params.print_progress = False

    # load WAV file
    samplerate, data = wavfile.read(fname_wav)

    # convert to 32-bit float
    data = data.astype("float32") / 32768.0

    # run the inference
    result = whisper.whisper_full(
        ctypes.c_void_p(ctx),
        params,
        data.ctypes.data_as(ctypes.POINTER(ctypes.c_float)),
        len(data),
    )
    if result != 0:
        print("Error: {}".format(result))
        exit(1)

    # print results from Python
    # print("\nResults from Python:\n")
    n_segments = whisper.whisper_full_n_segments(ctypes.c_void_p(ctx))
    for i in range(n_segments):
        t0 = whisper.whisper_full_get_segment_t0(ctypes.c_void_p(ctx), i)
        t1 = whisper.whisper_full_get_segment_t1(ctypes.c_void_p(ctx), i)
        txt = whisper.whisper_full_get_segment_text(ctypes.c_void_p(ctx), i)

        print(f"{t0/1000.0:.3f} - {t1/1000.0:.3f} : {txt.decode('utf-8')}")

    # free the memory
    whisper.whisper_free(ctypes.c_void_p(ctx))

Letorshillen avatar Jan 19 '23 10:01 Letorshillen

The whisper_full_params struct has to be mapped precisely. I immediately see that the order of the n_threads and n_max_text_ctx is wrong:

image

ggerganov avatar Jan 19 '23 17:01 ggerganov

Hey, thanks for your quick reply ^^. In my Linux-Subsystem the code was working with the whisper_full_params struct from above. Anyway I changed the structure in my windows code and its still not working. I dont think the code is the problem but the building of the libwhisper file. I used your building comments from issue #9

# build shared libwhisper.so
gcc -O3 -std=c11   -pthread -mavx -mavx2 -mfma -mf16c -fPIC -c ggml.c
g++ -O3 -std=c++11 -pthread --shared -fPIC -static-libstdc++ whisper.cpp ggml.o -o libwhisper.so 

also tried it with

# build shared libwhisper.so
gcc -O3 -std=c11   -pthread -mavx -mavx2 -mfma -mf16c -fPIC -c ggml.c
g++ -O3 -std=c++11 -pthread --shared -fPIC -static-libstdc++ whisper.cpp ggml.o -o libwhisper.dll 

But i dont have much experience with building c/cpp libs.

@chidiwilliams I saw in your comment that you also had some problems making it work on windows. Were you able to fix the errors?

Letorshillen avatar Jan 20 '23 13:01 Letorshillen