GLEN BERTULFO
GLEN BERTULFO
Observed this issue when attempting to run Llama2-7B 32 x32 token inference on Flex170 x8 DUT. For reference, this DUT is accessible followng the instructions here -- [Welcome to the...
I followed the steps from this github link -- https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/Deepspeed-AutoTP/README.md and attempted to verify 2 GPU inference runs on these Token Combinations 1) Initial run using default script with sym-int4...
Using merged from https://github.com/intel-analytics/ipex-llm/pull/10558 , I retried the 2 GPU execution on an ATSM1 x8 cards system (specs listed here -- https://wiki.ith.intel.com/display/MediaWiki/Flex-170x8+%28Inspur+-+ICX%29+Qualification) git cloned - https://github.com/intel-analytics/ipex-llm I ran the default...