neural-speed
neural-speed copied to clipboard
developer_document.md need elaboration on determining buffer sizes?
In the example for adding to gptneox_mem_req I see that n_layers comes from the num_hidden_layers in the config.json file, but where does the 512, 512, and 1024 come from? Maybe a comment in the document would help.
I was looking to extend the existing bloom capability to handle https://huggingface.co/bigscience/bloom but it's not obvious to me how chose the right scratch sizes from the config.json.