MonolithFoundation

Results 91 comments of MonolithFoundation

No, I actually build my own pipeline. Sevlrio + paramformer is good. fsmn is not very good.

请问支持box放到prompt里面做region 描述吗?例如给定框,说这是谁,让他描述?

{'pixel_values': array([[1.9303361, 1.9303361, 1.9303361, ..., 1.9325962, 1.9325962, 1.9325962], [1.9303361, 1.9303361, 1.9303361, ..., 1.9325962, 1.9325962, 1.9325962], [1.8865409, 1.8865409, 1.8865409, ..., 1.8899357, 1.8899357, 1.8899357], ..., [1.9011393, 1.8719424, 1.857344 , ..., 1.3922336, 1.7619553,...

Thank u so much for the consideration!

It has issues.

Does the training need words level timestamp? 10000 hours needs many audio data, does not very clean (like Emilla dataset) can fill the constraints?

@maximizemaxwell any updates on this?

idefics2 does not have a lite llm version params too much. however I tried enlarge the input ids length. but it will crash. why ---- Replied Message ---- | From...

if it is can be trainig that will helpful in training MLLM model for OCR and Markdown converting like gpt4o ---- Replied Message ---- | From | Tyler ***@***.***> |...