MangoFF issues

Results 6 issues of


                                            MangoFF

In my task,it alwasy return nan,May be it is not as wide use as BCE?

Why It always return nan?here is my log (l1_loss): L1Loss() (new_loss): AsymmetricLossOptimized() (bcewithlog_loss): AsymmetricLossOptimized() (iou_loss): IOUloss() ) ) 2022-04-20 13:03:33 | INFO | yolox.core.trainer:202 - ---> start train epoch1 2022-04-20...

The output dim is 1x1x0x7

Thanks a lot for your guide.I feel so sorry to figure out that your model output dim is 1x1x0x7 ,so when I use the dlc you made,the snpe throw an...

Moco

thd format don't support no_mask

[Attention](https://github.com/NVIDIA/TransformerEngine/blob/bfe21c3d68b0a9951e5716fb520045db53419c5e/transformer_engine/pytorch/attention.py) In line:5199 ``` if qkv_format == "thd": assert ( "padding" in attn_mask_type ), "Attention mask type must be padding or padding_causal for qkv_format=thd!" ``` why don't support full attention...

Mathematical formulas cannot be displayed normally.

It is So Uncomfortable for me, as it can't display mathematical formulas correctly. Hope for solving this bug, thank you.

关于模型指标有一些疑问

为什么Deepseek-Math-7B-rl 已经到了88.2%,但是DeepSeek-LLM-67B Chat只有84%？67B的综合模型，在数学能力上比7B的Math专有模型要差。