mistral-inference
mistral-inference copied to clipboard
Mixtral sliding window
What is the sliding window size for Mixtral training, Is it 32k or 8k