牛宇霖

Results 2 issues of 牛宇霖

Hi, I'm testing the attention mechanisms on kaggle TPU vm v3-8. it said below: ``` pallas_flash is Failed : Mosaic kernels cannot be automatically partitioned. Please wrap the call in...

I can load the model using below code: ``` from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "/root/private_data/models/Meta-Llama-3.1-70B-Instruct" model = AutoModelForCausalLM.from_pretrained(model_id, device_map='auto',load_in_4bit=True,attn_implementation="flash_attention_2") tokenizer = AutoTokenizer.from_pretrained(model_id) ``` However, When I try to...

fixed - pending confirmation