Matthias Reso comments

Results 46 comments of


                                            Matthias Reso

peft_method works fine with lora, but pops error when using prefix and llama_adapter

Closing this issue due to inactivity, feel free to reopen if there are further questions.

No output folder

@Tejaswgupta thanks for flagging this. We need to revisit the saving logic. You selected run_validation: bool=False in your config which effectively disables saving of the result. I'll try to create...

No output folder

Hi @BugmakerCC can you check your eval loss and post the log of your training run? We've seen the eval loss turning to Inf which prevents a checkpoint from being...

No output folder

Yes, your eval loss is NaN so no checkpoint gets saved: ``` evaluating Epoch: 100%|�[32mâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ�[0m| 100/100 [01:28

No output folder

Can have many reasons. Are you using the original alpaca json or a modification? Did you figure out why some weights are not initialized?

NCCL watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Hi, I've seen this error message in different places and it seems to be rather a side effect than the actual cause of the crash. Can you elaborate a bit...

Model Checkpoint NOT saved, Eval Loss "Inf"

Hi, the eval loss being inf will prevent saving the checkpoint as we compare against an initial best eval loss of inf Comparison is [here](https://github.com/facebookresearch/llama-recipes/blob/322522e9a272c60df7c07ff738a464676ba4c086/utils/train_utils.py#L148C29-L148C29) Initial best eval value [here](https://github.com/facebookresearch/llama-recipes/blob/322522e9a272c60df7c07ff738a464676ba4c086/utils/train_utils.py#L80)...

Use weights_only for load

Seems like weights_only is not working in this test: ``` ## Registering my_text_classifier_scripted_v3 model 2024-04-04T17:35:44,573 [DEBUG] epollEventLoopGroup-3-8 org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model my_text_classifier_scripted_v3 2024-04-04T17:35:44,573 [DEBUG] epollEventLoopGroup-3-8 org.pytorch.serve.wlm.ModelVersionedRefs...

Update cpp/llamacpp to Llama 3

Currently blocked by [6819](https://github.com/ggerganov/llama.cpp/issues/6819)

If micro_batch_size of micro-batch is set to 1, then model inference is still batch processing?

Hi @pengxin233 yes, it will still aggregates 10 requests (or wait until max batch delay) to perform the inference. The inference method of the handler will only see a single...