blog
blog copied to clipboard
question about Mistral-8x7b inference
Hello, quick question about the Mistral-8x7b inference, i've read it online that during inference it's using a router network to select just 2 of the models to produce the outputs, hence it's fast. I also know there're many techniques for batching requests such as continuous batching etc. Would huggingface inference engine batching functionality still works given that multiple inputs might trigger different models in a non-deterministic way due to the nature of a SMOE model