Qingyou Meng

Results 47 comments of Qingyou Meng

Thanks @slaren > * In llama.cpp, to avoid allocations in every eval, the work buffer memory could be stored as a `std::vector` in `lama_context`. Just `resize` it to the `work_size`,...

https://github.com/ggerganov/llama.cpp/pull/1999/commits/bf63002af94bc8fe39bca1a44e1344fad420f96b is the latest commit per @slaren 's suggestion. TODO: `The sugar is used by baby-llama. It calls ggml_graph_compute again and again. I suggest we delay it to next PR.`...

> I've refactored the changes: Looks good. Verified main and test-grad0. > A positive side-effect is that the user can now control the number of tasks for each op. -...

Even if `pthread_cancel` or sig USER works, that are async way to force stopping running threads. [Async is not easy to control](https://stackoverflow.com/questions/11271616/pthread-cancel-function-failed-to-terminate-a-thread#:~:text=The%20default%20cancel%20method%20of%20pthread_cancel%20%28%29%20is,to%20set%20another%20cancel%20method%20to%20your%20thread), thinks about not all threads are killed at...

> I see, `pthread_cancel` seems like a bad idea. Earlier today I changed it so that every thread calls the callback and immediately returns. This might be graceful? Me too,...

> I think using a callback for aborting is pretty standard Sure, callback is more versatile, and convenient to use, this is why I said `it depends`. Overall, if we...