namespace-Pt
namespace-Pt
This is a weird issue. I'll try it myself very soon. Could you please share your training log? Is the loss for more than 1 epoch normal?
I think it would work. Please report your result here if you would like to have a try :)
Hi, thanks for your interest. - LLM-Embedder is able to encode the conversation context as the input query because it has been trained on QReCC, a conversational search dataset. You...
Hi, maybe you need to further fine-tune the model to retrieve based on the user input.
Hi, it's weird that unsloth was able to use DDP two months ago :anguished:. Maybe you should wait newer version of unsloth or use some LLM framework like Megatron for...
Try the following? ```bash CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node 1 -m main.train \ --data_root /data/long-llm \ --output_dir data/outputs/$output_name \ --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct \ --train_data long-llm:gpt/one_detail_book.train.64K.json long-llm:gpt/one_detail_paper.train.64K.json long-llm:gpt/multi_detail_book.train.json long-llm:gpt/multi_detail_paper_short.train.json long-llm:gpt/multi_detail_paper_long.train.json long-llm:gpt/bio_book.train.json long-llm:longalpaca/train.json long-llm:redpajama/train.json[5000] \...
Currently no. You can try use [easy_context](https://github.com/jzhang38/EasyContext) while using our training data.
你好,这俩checkpoint之前在服务器上删了,一直没重训,你可以直接用脚本自己训一下~
Hi, - All books are from books3 subset of the Pile. - All papers are from arxiv subset of the Pile.
hi, 目前没有, 你可以直接测我们发的那个 ckpt. 不过近期我们会更新一下论文和模型, 效果更好