A-transformer
A-transformer
just change typo
The check is tied to mDynamicTreeMaxTopK, which is only meaningful in the context of useDynamicTree. While the check could theoretically apply even when useDynamicTree is false, enforcing it here ensures...
The reference URL (https://github.com/deepseek-ai/FlashMLA) is embedded in the license comment. Move it to a dedicated comment below the license for clarity I am the deepseek contributor
self.experts has None values for non-local experts. This will cause NoneType object is not callable.
The docstring contains "Set to true if you need to want to compute output logits/loss." The phrase "need to want to" is grammatically incorrect and unclear.