MangoFF

Results 6 issues of MangoFF

Why It always return nan?here is my log (l1_loss): L1Loss() (new_loss): AsymmetricLossOptimized() (bcewithlog_loss): AsymmetricLossOptimized() (iou_loss): IOUloss() ) ) 2022-04-20 13:03:33 | INFO | yolox.core.trainer:202 - ---> start train epoch1 2022-04-20...

Thanks a lot for your guide.I feel so sorry to figure out that your model output dim is 1x1x0x7 ,so when I use the dlc you made,the snpe throw an...

[Attention](https://github.com/NVIDIA/TransformerEngine/blob/bfe21c3d68b0a9951e5716fb520045db53419c5e/transformer_engine/pytorch/attention.py) In line:5199 ``` if qkv_format == "thd": assert ( "padding" in attn_mask_type ), "Attention mask type must be padding or padding_causal for qkv_format=thd!" ``` why don't support full attention...

It is So Uncomfortable for me, as it can't display mathematical formulas correctly. Hope for solving this bug, thank you.

为什么Deepseek-Math-7B-rl 已经到了88.2%,但是DeepSeek-LLM-67B Chat只有84%?67B的综合模型,在数学能力上比7B的Math专有模型要差。