neural-speed issues

[Neural Speed] Enable StableLM-2-12B

## Type of Change Stability.ai open sourced StableLM-2-12B, which has a different architecture than its 1.6B & 3B counterparts. This PR adds support for these models: stabilityai/stablelm-2-12b & stabilityai/stablelm-2-12b-chat. ##...

aahouzi

Garbled characters with beam search

16

` model_name = "meta-llama/Llama-2-7b-chat-hf" tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = Model() model.init(model_name, use_quant=True, weight_dtype="int4", compute_dtype="int8") tokens = tokenizer("What's your favorite animal?", return_tensors='pt').input_ids outputs = model.generate(tokens, num_beams=2, do_sample=False, max_new_tokens=10) text =...

jiafuzha

Add support for phi-3-mini-128k model

2

Please add support for the phi-3-mini-128k(context length) model in neural-speed.

bil-ash

Loading checkpoint shards takes too long

2

When I load "meta-llama/Meta-Llama-3-8B-Instruct" model like this ` from transformers import AutoTokenizer, TextStreamer from intel_extension_for_transformers.transformers import AutoModelForCausalLM model_name = "meta-llama/Meta-Llama-3-8B-Instruct" # Hugging Face model_id or local model tokenizer = AutoTokenizer.from_pretrained(model_name,...

irjawais

i wish for simpler way to run the model

4

i'm not well versed with python and where do i put the downloaded llama-2-7b-chat.Q4_0.gguf file? i can make llama.cpp work real easy on my laptop but i cant seem to...

kolinfluence

remove matB & matB_acc block_size_x constrain for better simd_lane utilization

## Type of Change feature or bug fix or documentation or others API changed or not ## Description detail description Issues: xxx ## Expected Behavior & Potential Risk the expected...

zhewang1-intc

Performance Gap between Neural Speed Matmul Operator and Llama.cpp Operator

13

I’ve discovered a performance gap between the Neural Speed Matmul operator and the Llama.cpp operator in the Neural-Speed repository. This issue was identified while running a benchmark with the ONNXRuntime-GenAI...

aciddelgado

enhancement

[CI] Add LLAMA3 ACC CI Test

1

## Type of Change feature or bug fix or documentation or others API changed or not not ## Description Add llama2 model acc ut for piqa task, llama2 rtn quant...

LJ-underdog

CI

i saw how beautiful this repo is, in terms of parallelism / numa stuff etc.

1

i understand this is intel repo but curious will amd work as well or... what kind of architecture / intel chip set is best used with this repo? about to...

kolinfluence

Is tensor parallelism supported by neural speed?

2

an example of TP has been provided by Neural speed document: mpirun -np 2 -bind-to=socket ./build/bin/main_gptj -m ne-q4_0.bin --seed 1234 -t 56 -c 68 -n 32 -p "Once upon a...

zhangnju

neural-speed
neural-speed copied to clipboard

Metadata

[Neural Speed] Enable StableLM-2-12B

Garbled characters with beam search

Add support for phi-3-mini-128k model

Loading checkpoint shards takes too long

i wish for simpler way to run the model

remove matB & matB_acc block_size_x constrain for better simd_lane utilization

Performance Gap between Neural Speed Matmul Operator and Llama.cpp Operator

[CI] Add LLAMA3 ACC CI Test

i saw how beautiful this repo is, in terms of parallelism / numa stuff etc.

Is tensor parallelism supported by neural speed?

← Metadata

Owner

Metadata

neural-speed neural-speed copied to clipboard

Metadata

← Metadata

Owner

Metadata

neural-speed
neural-speed copied to clipboard