Tianle Cai
Tianle Cai
> It looks like a lot of the groundwork is being laid out here with the parallel decoding implementation: [ggerganov/llama.cpp#3228](https://github.com/ggerganov/llama.cpp/pull/3228) Yeah, that's also what I thought. The tree attention implementation...
> I would like to help with finalizing the support for this. Is there any place where I can contact the group behind this project and ask questions? Hi @kalomaze...
Hi Alex, Thanks for your interest! It is a good idea to extend the SGConv to 2d filters, but we didn't try that because we focused on long sequence modeling...
@Tylersuard Hi Tyler, we just pushed a standalone SGConv code, and you can have a try now! We tried to run on sequence with 1M tokens with model dimension 256,...
Hi Jiayu, Thanks for your interest! For your questions, first, d_state was used in the original S4 code, and we hadn't fully cleaned it, so please ignore it. Second, the...
Can you please try to update the PyTorch version, this may relate to the issue of incompatibility of nn.DataParallel and nn.ParameterList (e.g., https://github.com/pytorch/pytorch/issues/36035)? Also, please use x = torch.randn(4, 256,...
Thanks for your interest! The basic function of determining which existing tool to use and invoking ToolMaker to make new tools when needed is implemented by the Dispatcher (Please see...
Hi Madhur, sorry for the delay, I've frequently been traveling recently... As for your question, we just added a preview branch containing the whole repo (haven't been completely double-checked, and...