drxmy
drxmy
> I think Tim is working on the 4bit inference kernel which hopefully will be available in the coming weeks During inference, Will the model also convert between fp16 and...
For example, my model can do many different tasks like rewriting, GEC or some NLU. In the simplest case, these tasks are sloved by different prompts which are not visible...
I will give it try first. Thank you!
LLaMA please
> For LLAMA or other generative AI model needs, you may check out HippoML: https://blog.hippoml.com/large-language-model-inference-from-datacenter-to-edge-ed2f94da4a81 > > @drxmy @dhawalkp Thank you! I just joined the waitlist. Is this another open...
I used Adamw with tranformers's trainer class(hugging face). It printed a trainable parameter count. The number was much smaller with Lora.
Do you figure it out? I also only see causal mask for training. Inference has padding but the attention mask computed by get_ltor_masks_and_position_ids does not consider padding.
> > Do you figure it out? I also only see causal mask for training. Inference has padding but the attention mask computed by get_ltor_masks_and_position_ids does not consider padding. >...