VatsaDev comments

Results 88 comments of


                                            VatsaDev

trafficstars

The queries generated a lot of repetitions. Possible to provide 1T again "fix"?

I dont think the The Data-loading bug affected the 0.b and 1t, thats more of a tiny model thing, along with bad settings. increase rep penalty, along with temp, and...

The queries generated a lot of repetitions. Possible to provide 1T again "fix"?

Well my main experience is with pythia-350m, gpt 2 medium, and GPT-2xl, and I dont have specific prompts to show off, but i do have a couple screenshots in my...

the dataset selection sft on OpenAssisant

They Perform a regular finetune on a40's with Oasst ChatMl, Theres also some DPO versions

How do you plan on dealing with hallucinations due to knowledge compression?

RAG would definitely help, but have you considered training the model on data similar to the SQUAD dataset, for familiarity with pulling factual answers from a context, so it would...

How do you plan on dealing with hallucinations due to knowledge compression?

RAG involves getting text data from documents or vector embeddings, which is great, but it won't work well for the basic text generation model this right now. when you make...

How do you plan on dealing with hallucinations due to knowledge compression?

@walking-octopus Toolformer in the way you suggest it might work, but what do mean special tokens? The steps are - it gets a natural language instruction - it makes an...

How do you plan on dealing with hallucinations due to knowledge compression?

@artnoage I read a paper on arxiv, can't find the link unfortunately. Sorry If I come across as certain, I am referring to it in a similar way to the...

How do you plan on dealing with hallucinations due to knowledge compression?

@xiaoyunwu, Instruction tuning seems to be good, but one of the main features of TinyLlama is the context size, which I believe is 2048. That probably makes the model a...

How do you plan on dealing with hallucinations due to knowledge compression?

@xiaoyunwu Looking at the dataset, I see that its there

How do you plan on dealing with hallucinations due to knowledge compression?

@Luoyingfeng8 I already responded to this for artonage, and I made this claim several months ago, since then, I've seen several instances of more trained tokens working for better models.