LiangxuanZhao
LiangxuanZhao
Hi @ruihan0495, I meet the same problem in stage 2. I am using bloomz and my own dataset. Have you figured out how to solve this problem?
> Thanks @LuciusMos ! The 2nd solution your proposed looks better to me, we can make `Installation and Setup` part with better structured, I also think we can support environment...
> Hi @LuciusMos, welcome. If I understand your question correctly, you are mentioning a code style problem rather then a bug right? I think this design is not on purpose....
> The main reason for putting assistant_msg before is because that is the input for the next iteration while the user_msg is not mandatory for looping so that is why...
Hi @juney-nvidia , just want to check in whether there's any progress regarding prefix caching for DS R1 model or even `trtllm-serve`. Thanks!
@zhhuang-nv Thanks for the update! This feature is really critical in my scenario:)
Hi @Wendong-Fan, I was unable to get access to cerebras API at that time so there was no substantial progress. Recently I don't have enough bandwidth to work on it....