llmexperiment
llmexperiment
> @tridao `selective_scan_fn(u, delta, A, B, C, D)` resulted in speed up but its still significantly slower for N=16. > >  HI @apoorv2904 , are you able to reproduce...
> We calc FLOPs based on the ref code, though it is very different from the real speed in practise. > > ```python > def flops_selective_scan_ref(B=1, L=256, D=768, N=16, with_D=True,...
> The formula we used is `9 * d_state * d_model` (times batch size times sequence length). This is for a forward pass, so triple that for forward + backward...
> We have no experience with ONNX. Do you have ideas on how to generate onnx for custom operations? If so would you like to contribute? Thanks @tridao ! I...
> Thanks for your work! I wonder if this code will run on windows? Setup venv, it works.
> The appendix contains the details. We follow GPT3 specs (e.g. for 7B, hidden dim = 4096). @tridao @albertfgu Thanks for the response! The Appendix contains specs up to 1.3B...