TriDefender

Results 21 comments of TriDefender

try adding this after loading the model; `model=model.to('xpu')` or else it will remain on the cpu and will not move to gpu

Minimum code: ``` from transformers import AutoTokenizer import torch Res ='./glm-4-9b-chat/' tokenizer = AutoTokenizer.from_pretrained(Res, trust_remote_code=True,encode_special_tokens=True) from ipex_llm.transformers import AutoModel model = AutoModel.from_pretrained(Res,load_in_4bit=True, trust_remote_code=True,optimize_model=False) model=model.to('xpu') #Preheat (borrowed from official example) question...

Hi @lzivan, Thanks for your advice, it did solve most of the problems, I will do more extensive testing to see if something will happen after a longer wait period.

Still crashed after a longer wait, left the program idle ``` 用户:睡不着怎么办? ChatGLM:Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File...

Hi @qiuxin2012, I have 32 gigabytes of physical memory in format of 16+16, other than that, windows configured 20 gigabytes of virtual memory. After loading the model and doing the...

Hi @bibekyess , If you are concerned about vram issue, you can try adding `cpu_embedding=True`when loading the model, the model in RAM should be unloaded when moved to XPU, but...

memory usage will increase after a few inferences because of all the token cached, try clearing the history and try again, the same situation happened here too, perhaps waiting a...

Hi,@qiuxin2012 , I tried your workaround, but it was not helpful, here are the logs: ``` Special tokens have been added in the vocabulary, make sure the associated word embeddings...

This shouldn't be a OOM issue, because I do have enough physical ran and swapfile.