Req for Ollama reload of models + chapters not reaching word counts
I have been using this for about a week now and I'm loving it (almost) and more people should know about this program. I currently have 2 problems with it.
-
I set a prompt like this: "Please write a story set in modern times, the story should contain 10 chapters of 1000-1500 words in each chapter." Then I add the story details. I have noticed that if you use a single model for all steps then it hits the context limit really quick. (I initially though that none of my models worked as they would all start looping). Then I tried copying the file in Ollama to a new name but it must have known it was the same file as it didn't reload. Is there a way to add reloading of a model in Ollama for each model stage of the config.py? Nothing popped out at me in the Ollama library (I only know very basic python). This would reset the context for the model at each stage and clear up part of the problem.
-
I can also see that Ollama has a context of only 2k for all models but looking at the Ollama python library I saw a reference to num_ctx in the _types.py under class Options(TypedDict, total=False): # load time options num_ctx: int. I think it may be able to be changed somewhere in your code but I have no idea where it would go. (1000 words should only be around 1400 tokens + overheads but with 2k context it's not going far.)
Thank you
[edit] just found that num_ctx is already listed in wrapper.py but not implemented. Not sure how to implement it.
[edit] just found that num_ctx is already listed in wrapper.py but not implemented. Not sure how to implement it.
You can pass it as model parameter but I'll add a default for it for 8192 which should be a healthy amount. Care for the extra VRAM usage (~+200MB for qwen2.5:7b).
Great. Thanks for that. Any thoughts on the reloading of the models? Even with 8k I don't think it would make it through an entire run with a single model. (qwen2.5:7b is ok for testing but I'll move on to eva.qwen2.5:72b which I find is really good at novels.)
They shouldn't fill the context up. It doesn't pass the whole history down on each turn. It should only pass the required information for each each step. You can also see the prompts here:
https://github.com/datacrystals/AIStoryWriter/blob/main/Writer/Prompts.py
The reason I ask is that if you look at my log, by the time it gets to the end of the chapter outline just for 10 chapters the context has risen to (Warning, Detected High Token Context Length of est. ~71265.2tok). If I wanted to do a 30 chapter outline for a novel that could take it to over 200K. Most of the time when it start climbing towards that number I start getting JSON Error during parsing: Expecting value: line 1 column 1 (char 0) looping no matter which model I use.
Edit: Just did another test run just using command-r and this is what happened.
As you can see once it hit 170k it just started writing gibberish and it had only gotten to the start of the chapter 4 outline.
Thanks for pointing out. Will investigate further.
I have the same issue with openrouter or llama.cpp, it's like there is no limit on the context and more chapters there is and more it's an issue.
Hmm, I've been thinking of doing a rewrite for the whole generation system so i'd imagine that might be the only real way to properly address this - will add as something to keep an eye on.