exo icon indicating copy to clipboard operation
exo copied to clipboard

Dynamic TFLOPS Calculation

Open snico432 opened this issue 1 year ago • 13 comments

I attempted to implement a dynamic TFLOPS calculation (in response to #243) as a fallback in case the device is not found in the lookup table. I know that PyTorch is not yet a dependency for the project but I saw some discussion in #139 that it will be soon. Please forgive me if this PR is incorrectly formatted, it's my first time attempting to contribute to an open-source project, but I was motivated to because I find this project really cool! Also, please let me know if there are any changes I should make to the code.

snico432 avatar Oct 06 '24 19:10 snico432

Yes!! I was hoping someone would do this. Great work! I haven't taken a proper look / tested yet, but I love to idea of dynamically calculating FLOPS based on a quick benchmark.

I added a $200 retrospective bounty for this which will be paid once merged

AlexCheema avatar Oct 06 '24 20:10 AlexCheema

One suggestion: I don't want to force torch as a dependency. If we could lean on the existing InferenceEngine infrastructure we have, that would be great. Perhaps each InferenceEngine can implement a benchmark function?

AlexCheema avatar Oct 06 '24 21:10 AlexCheema

Screenshot 2024-10-07 at 01 05 01

Getting this output now after

exo --inference-engine pytorch --run-model llama-3.1-8b

AlexCheema avatar Oct 06 '24 21:10 AlexCheema

Screenshot 2024-10-07 at 01 05 01 Getting this output now after
exo --inference-engine pytorch --run-model llama-3.1-8b

I'm assuming this comment was meant for #139 ?

snico432 avatar Oct 06 '24 21:10 snico432

Screenshot 2024-10-07 at 01 05 01 Getting this output now after ``` exo --inference-engine pytorch --run-model llama-3.1-8b ```

I'm assuming this comment was meant for #139 ?

you're right. sorry ignore that :)

AlexCheema avatar Oct 06 '24 21:10 AlexCheema

Just ping me when you want me to review the PR again.

AlexCheema avatar Oct 07 '24 21:10 AlexCheema

@AlexCheema I added support for benchmarking on the MLX inference engine. Right now it only benchmarks f32 and f16 calculations because mlx doesn't support matrix multiplication for int8. Not sure how I should proceed with that. Please let me know if this is what you're looking, or if you'd like me to make some changes. I'll move on to Tinygrad afterwards.

snico432 avatar Oct 11 '24 03:10 snico432

@AlexCheema I created a benchmark for tinygrad, cleaned up the mlx benchmark and attempted to implement your requests from your last review. I also removed the dict from device_capabilites and made all the TFLOPS get calculated dynamically.

snico432 avatar Oct 15 '24 03:10 snico432

@AlexCheema DeviceCapabilites are now lazily computed. PTAL

snico432 avatar Nov 10 '24 18:11 snico432

Please fix merge conflicts.

AlexCheema avatar Nov 24 '24 18:11 AlexCheema

@AlexCheema resolved merge conflicts.

snico432 avatar Nov 24 '24 19:11 snico432

@AlexCheema PTAL

snico432 avatar Dec 06 '24 00:12 snico432

@AlexCheema PTAL

snico432 avatar Dec 21 '24 23:12 snico432