MonolithFoundation
MonolithFoundation
@jishengpeng Waoo! > Potentially groundbreaking directions. I would like to inquire about the approximate direction of this and whether it will be opened. At what date can we expect this?
Is there any esitmated time for data releasing?
Same here
Got simillar error not sure if related: arch=VisionArchitecture.Qwen2VL, AttributeError: type object 'builtins.VisionArchitecture' has no attribute 'Qwen2VL'
Will maxtokens make the audio be truncated? I have to calculate the exactly output length as the original audio.
https://github.com/Ming-er/MGA-CLAP/blob/48ca5a5cd22cd34427e118bd8cf332090ec54770/tools/utils.py#L20 Can u take a look? this author used your lib in a very weired way...
Have u referenced another sam2 onnx implementation? looks like they make all works, including video tracking.
Hi, what if spk1 and spk2 have overlap? I just want a code that can send a voice in, output timestamp result.
Am wondering if there any as simple as possible function to do this for example: `dia_pred(audio_path)`, then it returns the timestamps dict. I looked the train_dia_pred code, way to complicated...
Thank u so much for the consideration! Hoping for a strong base diari model with overlap that can use at ease