Dynamic TFLOPS Calculation
I attempted to implement a dynamic TFLOPS calculation (in response to #243) as a fallback in case the device is not found in the lookup table. I know that PyTorch is not yet a dependency for the project but I saw some discussion in #139 that it will be soon. Please forgive me if this PR is incorrectly formatted, it's my first time attempting to contribute to an open-source project, but I was motivated to because I find this project really cool! Also, please let me know if there are any changes I should make to the code.
Yes!! I was hoping someone would do this. Great work! I haven't taken a proper look / tested yet, but I love to idea of dynamically calculating FLOPS based on a quick benchmark.
I added a $200 retrospective bounty for this which will be paid once merged
One suggestion: I don't want to force torch as a dependency. If we could lean on the existing InferenceEngine infrastructure we have, that would be great. Perhaps each InferenceEngine can implement a benchmark function?
Getting this output now after
exo --inference-engine pytorch --run-model llama-3.1-8b
Getting this output now after
exo --inference-engine pytorch --run-model llama-3.1-8b
I'm assuming this comment was meant for #139 ?
Getting this output now after ``` exo --inference-engine pytorch --run-model llama-3.1-8b ```
I'm assuming this comment was meant for #139 ?
you're right. sorry ignore that :)
Just ping me when you want me to review the PR again.
@AlexCheema I added support for benchmarking on the MLX inference engine. Right now it only benchmarks f32 and f16 calculations because mlx doesn't support matrix multiplication for int8. Not sure how I should proceed with that. Please let me know if this is what you're looking, or if you'd like me to make some changes. I'll move on to Tinygrad afterwards.
@AlexCheema I created a benchmark for tinygrad, cleaned up the mlx benchmark and attempted to implement your requests from your last review. I also removed the dict from device_capabilites and made all the TFLOPS get calculated dynamically.
@AlexCheema DeviceCapabilites are now lazily computed. PTAL
Please fix merge conflicts.
@AlexCheema resolved merge conflicts.
@AlexCheema PTAL
@AlexCheema PTAL
Getting this output now after