ponytaill comments

Repositories
Issues
Comments

Results 3 comments of


                                            ponytaill

Weight int4 quantization, but actually it is int16

> I get this, weight is fake int4, in calculation, actually is int16 If it's convenient for you, could you explain it?

[Question] Llama3: How to solve GPU Out of Memory Error on Pixel 8 Pro?

> It seems that this level of hardware configuration is not yet able to drive models of 8B and above. Currently, I have found that only models around 1.5B can...

[Bug] how to accurately measure the real memory usage on Android ？

I used Android Studio tools for memory profiling, and got the result. It seems to be correct. The left and right peaks represent two models of different sizes ![c393468577ea85cd54aaa77057f39dc7](https://github.com/user-attachments/assets/e0824fcf-e7a5-4aed-b89d-5a443c109fdc)