Carlos Mocholí comments

Results 427 comments of


                                            Carlos Mocholí

No module named 'torch.utils._device'

Can you share the complete error stacktrace? Are you using `torch==2.0`?

No module named 'torch.utils._device'

Can you share the output of `pip list | grep torch` and `python -c 'import torch; print(torch.__version__)'`? You might have non-release version that doesn't include that file. Reinstalling torch by...

Conversion to ggml format

Another option would be a conversion to HF format (already requested in https://github.com/Lightning-AI/lit-llama/issues/150) since the `ggml` conversion supports it already: https://github.com/ggerganov/llama.cpp/blob/ac7876ac20124a15a44fd6317721ff1aa2538806/convert.py#L594

Conversion to ggml format

The format is defined by the nn.Module definition. Since we provide our own implementation, the keys are different.

TypeError: super(type, obj): obj must be an instance or subtype of type

This has been fixed in lit-gpt: https://github.com/Lightning-AI/lit-gpt

Is there an interactive mode?

I implemented one in https://github.com/Lightning-AI/lit-stablelm/blob/main/chat.py. It could be copied over to this repository.

Convert lit-llama weights to huggingface

@timothylimyl Lit-Parrot supports this via FSDP, added in https://github.com/Lightning-AI/lit-parrot/commit/248d691f06d68c7e92d3230260eda0055f7dc163. Support for this could be easily ported to Lit-LlaMA

Convert lit-llama weights to huggingface

Yes, but it would be better if you or somebody else from the community works on the port. The sharding is configured via the `auto_wrap_policy` function used in the commit...

Issue with "kv_cache" while using modified generate/lora.py for a list of inputs

You can `reset_cache` after generation. Lit-GPT does it: https://github.com/Lightning-AI/lit-gpt/blob/main/generate/base.py#L180

Issue with "kv_cache" while using modified generate/lora.py for a list of inputs

You can read about the KV cache here: https://kipp.ly/transformer-inference-arithmetic/ It depends on the sequence length, so if it changes it needs to be reset. When you do inference with a...