andreapiso
andreapiso
This would be super helpful. We are in a peotected environment (thanks, IT!) Where we can only install cuda via conda. Conda cuda does not come with `cuda.h` because of...
@okhat is there any way to extract the compiled prompts before the execution? e.g. still with their template keywords {question}, {search_query} etc... it would be useful to be able to...
Thanks a lot @okhat! I guess the `program.save` does not contain the prompt itself, but all the components that can be used to reconstruct it, while `lm.history` contains the prompt...
Interesting! Are the candidate programs right now iterating only through different combinations of demonstrations, or are there modifiers on the prompt itself? (instructions, descriptions, etc...)
@michaelfeil is this related? Yes, vLLM supports continuous batching, but I'm looking to understand if Ctranslate can be extended to support that, without using vLLM.
Yes, bufferize incoming requests and sending them together is what i meant for static batching. Is 1. not possible today because of a difference in architecture between CT2 and HF...
+1 for the feature! I need to have some "type" criteria to determine whether an edge is valid or not. @MrMugame would you be able to share with the community...
> Wow, great job! Love how in-depth the tutorial is. > > One thing I would point out that you might want to change in your tutorial: > > ```js...
Does the problem still persist?
Any progress in solving this issues? We are encountering the same problem feeding many records to executemany