MonolithFoundation comments

Results 91 comments of


                                            MonolithFoundation

Scale on data and comparasion to SOTA

@jishengpeng Waoo! > Potentially groundbreaking directions. I would like to inquire about the approximate direction of this and whether it will be opened. At what date can we expect this?

about train code

Is there any esitmated time for data releasing?

about train code

Same here

`phi4mm.py` script is failing on MacOS M2 (Python 3.13.2)

Got simillar error not sure if related: arch=VisionArchitecture.Qwen2VL, AttributeError: type object 'builtins.VisionArchitecture' has no attribute 'Qwen2VL'

How to forcely control the output length (duration)?

Will maxtokens make the audio be truncated? I have to calculate the exactly output length as the original audio.

ModuleNotFoundError: No module named 'sed_scores_eval.utils.scores'

https://github.com/Ming-er/MGA-CLAP/blob/48ca5a5cd22cd34427e118bd8cf332090ec54770/tools/utils.py#L20 Can u take a look? this author used your lib in a very weired way...

How to get all masks directly?

Have u referenced another sam2 onnx implementation? looks like they make all works, including video tracking.

About usage

Hi, what if spk1 and spk2 have overlap? I just want a code that can send a voice in, output timestamp result.

About usage

Am wondering if there any as simple as possible function to do this for example: `dia_pred(audio_path)`, then it returns the timestamps dict. I looked the train_dia_pred code, way to complicated...

About usage

Thank u so much for the consideration! Hoping for a strong base diari model with overlap that can use at ease