exo
exo copied to clipboard
Implementation of DummyInferenceEngine
This PR addresses issue #325 by implementing a DummyInferenceEngine for testing purposes.
Changes
- Created
DummyInferenceEngine
class inexo/inference/dummy_inference_engine.py
- Modified
exo/inference/inference_engine.py
to include the dummy engine option - Updated
exo/main.py
to support selection of the dummy engine - Added basic testing for the DummyInferenceEngine
Features
- Simulates inference without loading a real model
- Generates random token outputs
- Simulates latency using
asyncio.sleep
- Fully asynchronous implementation
- Simulates shard loading
Testing
- Added basic test in
main.py
to verify DummyInferenceEngine functionality - Manually tested with
python main.py --inference-engine dummy
Feedback and suggestions for improvement are welcome!