A-transformer

Results 5 issues of A-transformer

just change typo

The check is tied to mDynamicTreeMaxTopK, which is only meaningful in the context of useDynamicTree. While the check could theoretically apply even when useDynamicTree is false, enforcing it here ensures...

Community want to contribute

The reference URL (https://github.com/deepseek-ai/FlashMLA) is embedded in the license comment. Move it to a dedicated comment below the license for clarity I am the deepseek contributor

Community want to contribute

self.experts has None values for non-local experts. This will cause NoneType object is not callable.

The docstring contains "Set to true if you need to want to compute output logits/loss." The phrase "need to want to" is grammatically incorrect and unclear.