Isotr0py

Results 7 comments of Isotr0py

### Test Result ```python3 from vllm import LLM from vllm import SamplingParams from vllm.lora.request import LoRARequest llm = LLM("meta-llama/Llama-2-7b-hf", enable_lora=True) sql_lora_path = "yard1/llama-2-7b-sql-lora-test" prompts = [ "[user] Write a SQL...

Maybe 2 lines in `train_network.py` cause this problem: ```python # unnecessary, but work on low-ram device text_encoder.to("cuda") unet.to("cuda") ``` These code just used to reduce RAM usage in low-ram environment...

@zhouyuan I have rebased the code. The native lora kernel should work again.

Generally, I agreed with @DarkLight1337's opinion about moving processing logics out from `Engine` to prevent modifying core code frequently. However, I think it's difficult to keep the processing logics fully...

> @Isotr0py Perhaps we could follow a registry pattern and have each model separately register how to preprocess the inputs? If the model does not do so, then the default...

Regarding #4228, I think there may be a situation that some MM models don't have a Processor implemented. >In this case, we would have to refactor the computation of attention...

> How should we ensure that our implementation is loaded instead of the HuggingFace one? I think we can refer to `get_config()` in `transformers_utils/config.py`, but searching registried processor firstly then...