exo Dynamic TFLOPS Calculation

I attempted to implement a dynamic TFLOPS calculation (in response to #243) as a fallback in case the device is not found in the lookup table. I know that PyTorch is not yet a dependency for the project but I saw some discussion in #139 that it will be soon. Please forgive me if this PR is incorrectly formatted, it's my first time attempting to contribute to an open-source project, but I was motivated to because I find this project really cool! Also, please let me know if there are any changes I should make to the code.

Oct 06 '24 19:10 snico432

Yes!! I was hoping someone would do this. Great work! I haven't taken a proper look / tested yet, but I love to idea of dynamically calculating FLOPS based on a quick benchmark.

I added a $200 retrospective bounty for this which will be paid once merged

Oct 06 '24 20:10 AlexCheema

One suggestion: I don't want to force torch as a dependency. If we could lean on the existing InferenceEngine infrastructure we have, that would be great. Perhaps each InferenceEngine can implement a benchmark function?

Oct 06 '24 21:10 AlexCheema

Screenshot 2024-10-07 at 01 05 01

Getting this output now after

exo --inference-engine pytorch --run-model llama-3.1-8b

Oct 06 '24 21:10 AlexCheema

Getting this output now after
exo --inference-engine pytorch --run-model llama-3.1-8b

I'm assuming this comment was meant for #139 ?

Oct 06 '24 21:10 snico432

Getting this output now after ``` exo --inference-engine pytorch --run-model llama-3.1-8b ```

I'm assuming this comment was meant for #139 ?

you're right. sorry ignore that :)

Oct 06 '24 21:10 AlexCheema

Just ping me when you want me to review the PR again.

Oct 07 '24 21:10 AlexCheema

@AlexCheema I added support for benchmarking on the MLX inference engine. Right now it only benchmarks f32 and f16 calculations because mlx doesn't support matrix multiplication for int8. Not sure how I should proceed with that. Please let me know if this is what you're looking, or if you'd like me to make some changes. I'll move on to Tinygrad afterwards.

Oct 11 '24 03:10 snico432

@AlexCheema I created a benchmark for tinygrad, cleaned up the mlx benchmark and attempted to implement your requests from your last review. I also removed the dict from device_capabilites and made all the TFLOPS get calculated dynamically.

Oct 15 '24 03:10 snico432

@AlexCheema DeviceCapabilites are now lazily computed. PTAL

Nov 10 '24 18:11 snico432

Please fix merge conflicts.

Nov 24 '24 18:11 AlexCheema

@AlexCheema resolved merge conflicts.

Nov 24 '24 19:11 snico432

@AlexCheema PTAL

Dec 06 '24 00:12 snico432

@AlexCheema PTAL

Dec 21 '24 23:12 snico432