VatsaDev

Results 88 comments of VatsaDev
trafficstars

I dont think the The Data-loading bug affected the 0.b and 1t, thats more of a tiny model thing, along with bad settings. increase rep penalty, along with temp, and...

Well my main experience is with pythia-350m, gpt 2 medium, and GPT-2xl, and I dont have specific prompts to show off, but i do have a couple screenshots in my...

They Perform a regular finetune on a40's with Oasst ChatMl, Theres also some DPO versions

RAG would definitely help, but have you considered training the model on data similar to the SQUAD dataset, for familiarity with pulling factual answers from a context, so it would...

RAG involves getting text data from documents or vector embeddings, which is great, but it won't work well for the basic text generation model this right now. when you make...

@walking-octopus Toolformer in the way you suggest it might work, but what do mean special tokens? The steps are - it gets a natural language instruction - it makes an...

@artnoage I read a paper on arxiv, can't find the link unfortunately. Sorry If I come across as certain, I am referring to it in a similar way to the...

@xiaoyunwu, Instruction tuning seems to be good, but one of the main features of TinyLlama is the context size, which I believe is 2048. That probably makes the model a...

@xiaoyunwu Looking at the dataset, I see that its there

@Luoyingfeng8 I already responded to this for artonage, and I made this claim several months ago, since then, I've seen several instances of more trained tokens working for better models.