neural-speed developer_document.md need elaboration on determining buffer sizes?

developer_document.md need elaboration on determining buffer sizes?

Open hpcpony opened this issue 8 months ago • 1 comments

In the example for adding to gptneox_mem_req I see that n_layers comes from the num_hidden_layers in the config.json file, but where does the 512, 512, and 1024 come from? Maybe a comment in the document would help.

I was looking to extend the existing bloom capability to handle https://huggingface.co/bigscience/bloom but it's not obvious to me how chose the right scratch sizes from the config.json.

Jun 09 '24 21:06 hpcpony

neural-speed neural-speed copied to clipboard

developer_document.md need elaboration on determining buffer sizes?

neural-speed
neural-speed copied to clipboard