santacoder-finetuning
santacoder-finetuning copied to clipboard
Why is the inference speed using fp16 or bf16 similar to fp32?
Is there any specific configuration method? model = AutoModelForCausalLM.from_pretrained(checkpoint,trust_remote_code=True,torch_dtype=torch.float16)