Philip Meier

Results 359 comments of Philip Meier

All for it. The message (or chunks in case of streaming) that we are returning always have a role attached to them: https://github.com/Quansight/ragna/blob/3cef0f7da1f2ed90e5d0618bcad82f824d00dc5a/ragna/deploy/_api/schemas.py#L54-L57 Currently `MessageRole.ASSISTANT` is hardcoded for this https://github.com/Quansight/ragna/blob/3cef0f7da1f2ed90e5d0618bcad82f824d00dc5a/ragna/core/_rag.py#L223-L227...

> 1. Are we only supporting PDFs right now, or do we want to include all supported types? It doesn't seem too difficult to just send any blob to the...

Afterthought to 1.: in #487 we decided to use `None` as sentinel for the default corpus as decided by the source storage. Not sure how this can work through the...

`GET /corpuses/{name}/metadata` is not going to cut it as we potentially have multiple source storages with corpuses. Thus, `{name}` is not unique. I see two options 1. Switch to `GET...

> I like the idea of raising `NotImplemented` on `source_storage.list_corpuses()` in case the user decides to ignore corpuses altogether. On second thought, we should probably raise a descriptive `RagnaException` instead....

> Finally, should I add the actual UI changes as part of this issue, or will that be a separate issue? Up to you. When using two PRs we can...

> * We simplified things a little bit by passing a file that contains a list of S3 prefixes (`s3://my-bucket/foo/bar/spam.pdf, ...`) as an additional argument to our CLI tool. If...

Before I go over the individual proposal, one thing upfront: although we use `ragna.core.LocalDocument` by default, the user is free to use any subclass of `ragna.core.Document`: https://github.com/Quansight/ragna/blob/7071cf4fdaae03b89c837f6034dbac217dd81d72/ragna/deploy/_config.py#L146 https://github.com/Quansight/ragna/blob/7071cf4fdaae03b89c837f6034dbac217dd81d72/ragna/core/_document.py#L27 https://github.com/Quansight/ragna/blob/7071cf4fdaae03b89c837f6034dbac217dd81d72/ragna/core/_document.py#L82 The...

> * filesize use case : It might be useful in order to keep, for example, "all the PDFs big enough to have images" Let's start with adding that to...

We need a way to get the document content to the UI. Thus, I propose three new API endpoints: - `GET /documents` / `GET /documents/{id}`: Get all [documents](https://github.com/Quansight/ragna/blob/19d326ba4b6d8142b6e70f3f626e14267fb25ad1/ragna/deploy/_api/schemas.py#L18-L20) or one...