Murtadha Ahmed comments

Repositories
Issues
Comments

Results 2 comments of


                                            Murtadha Ahmed

RuntimeError: "triu_tril_cuda_template" not implemented for 'BFloat16'

model_args['attn_implementation'] = 'flash_attention_2' model = LlamaForCausalLM.from_pretrained(model_name, **model_args).eval() adding the flash_attention_2 works for me

Structured Prompting: GPT_neo_modeling.py

I tried this code, but it doesn't work ``` if prefix_parallel and prefix_parallel > 1 : key_length_ = ((key_length - query_length) // prefix_parallel) + query_length causal_mask = self.bias[:, :, key_length_...