LiuShen issues

Results 13 issues of


LiuShen

larger_than_threshold

https://github.com/SanghunYun/UDA_pytorch/blob/0ba5cf8d8a6f698e19a295119f084a17dfa7a1e3/main.py#L88 If both “larger_than_threshold” are True, then the value of "loss_mask" is 0 and sup_loss divided by 0 results in a loss of "nan"

About .stories and .train.bert.pt

hello, I downloaded CNN and Dailymail stories, and followed README.md 1-5 steps. I found that CNN has 90k+ stories and Dailymail has 200k+ stories, but generized 140+ .train.bert.pt？so amazing.

Adapter模型的合并

这是一个非常nice的工作。这里我有个小问题想请教一下：如题，Anima33B的adapter model是和原始的LLama合并后得到Anima33B merged嘛

大模型备案

您好，请问Yi是否通过了大模型备案。

baichaun2-13b增量预训练loss为0

作者你好，我使用baichuan2-13b做增量cpt时候loss一直是0. 我使用自己的数据集或是CNEWsum.jsonl都是0. ![image](https://github.com/yangjianxin1/Firefly-LLaMA2-Chinese/assets/58279305/986a79ea-eaf4-4f3f-82a6-0ce3c67d1a0b)

example dataset

training_scripts/inpainting_example.sh export INSTANCE_DIR="./data/data_captioned" Is it possible to open source the example dataset in the code？ Thanks

Some questions

How should I use FasterTransformer Triton to deploy my custom model, such as adding other structures after BERT? Assuming my model structure is defined like this: ```python class HfClassModel(): def...

Please help me /(ㄒoㄒ)/~~ ! failed to load 'fastertransformer' version 1: Unsupported: 1.

I implemented it step by step according to the tutorial of [https://github.com/triton-inference-server/fastertransformer_backend/blob/main/docs/bert_guide.md](url) ```shell git clone https://github.com/triton-inference-server/fastertransformer_backend.git cd fastertransformer_backend export WORKSPACE=$(pwd) export CONTAINER_VERSION=22.12 export TRITON_DOCKER_IMAGE=triton_with_ft:${CONTAINER_VERSION} python3 docker/create_dockerfile_and_build.py --triton-version 22.12 docker run...

请教llama pro扩展后保存的模型文件会变小

您好，想请教一下为什么用llama pro 的expand.sh脚本扩展后保存在本地的文件会变小。这里 1.2G的是原模型， 1.1G的模型对应num_expand=8, 946M的对应num_expand=2 ![image](https://github.com/hiyouga/LLaMA-Factory/assets/58279305/30b306a9-d6dc-4242-8637-c74c72ea3636)

modelscope/SkyPile-150B

**General Question** Before asking a question, make sure you have: * Searched the tutorial on modelscope [doc-site](https://modelscope.cn/docs) * Googled your question. * Searched related issues but cannot get the expected...

Stale