Daniel Bulhosa Solorzano comments

Repositories
Issues
Comments

Results 5 comments of


                                            Daniel Bulhosa Solorzano

Support file based prompt caching

Hey @giladgd, thanks for all of your work in the library. I have a couple of questions (some of them related to this) and I didn’t know a better way...

Add support for passing in `inputs_embeds` into `generate` function

> There is a way to overwrite the code itself and allow input_embeds to be passed, but it'll be a bit of custom code - another way is to save...

Add batched inference

Does this mean that continuous batching is not supported in llama-cpp-python? I assume this is the type of batching under consideration in this issue.

AssertionError: no sync context manager is incompatible with gradientpartitioning logic of ZeRo stage 3

Downgrading also worked for me. I was getting the error `AssertionError: It is illegal to call Engine.step() inside no_sync context manager` with stage 1.

Set up CI/CD using GitHub Actions

This is done right?