llmexperiment

Results 6 comments of llmexperiment
trafficstars

> @tridao `selective_scan_fn(u, delta, A, B, C, D)` resulted in speed up but its still significantly slower for N=16. > > ![image](https://private-user-images.githubusercontent.com/9625495/302871560-d89b2a17-8c73-4a2c-a8ac-6c232d7d1fd0.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MDkwOTI3OTYsIm5iZiI6MTcwOTA5MjQ5NiwicGF0aCI6Ii85NjI1NDk1LzMwMjg3MTU2MC1kODliMmExNy04YzczLTRhMmMtYThhYy02YzIzMmQ3ZDFmZDAucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDIyOCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDAyMjhUMDM1NDU2WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ZmQyYTY2ZjI4MDE5YzJlMTg0MWZiMjFlMGIxYmM5NjkwODA2ODZjYmY1MDU4MWQ3Njk0NTRhMjVlZmFmNTRiNSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.VH9sNUBt8wXe-9jU9qlXhB3hy_IoT5mxikftBZVXjAs) HI @apoorv2904 , are you able to reproduce...

> We calc FLOPs based on the ref code, though it is very different from the real speed in practise. > > ```python > def flops_selective_scan_ref(B=1, L=256, D=768, N=16, with_D=True,...

> The formula we used is `9 * d_state * d_model` (times batch size times sequence length). This is for a forward pass, so triple that for forward + backward...

> We have no experience with ONNX. Do you have ideas on how to generate onnx for custom operations? If so would you like to contribute? Thanks @tridao ! I...

> Thanks for your work! I wonder if this code will run on windows? Setup venv, it works.

> The appendix contains the details. We follow GPT3 specs (e.g. for 7B, hidden dim = 4096). @tridao @albertfgu Thanks for the response! The Appendix contains specs up to 1.3B...