alphaRGB
alphaRGB
I follow the "README.md" trained the model on **sample data** First: `python pre_process.py` Second: `python train_gpt2.py --num-layers=8 --embedding-size=768 --batch-size=32` Then, the training beigins, here us the **Loss** and **Accurancy** during...
Thanks for your greate works of int8 quantization for ViT, I have some problems about the quantization of ViT' SelfAttention As in transformer Attention: 1) attn_score = Q * K^T...
Hello, I am curious about FP4 data format. I have seen the binary interchange of FP4 in OCP MXFP4 definition. ([OCP Microscaling Formats (MX) Specification Version 1.0)](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf) For FP4 in...
As the document tested the BERT models and got good result, one question is this nn_pruning methods can be applied to other Transformer models, like Google ViT, Swin Transformer and...
MLPerf inference results is very useful for researchs and companies to compare their H/W and S/W architecture. Many Chinese researchers and origizations have participate in MLPerf, but the inference results...