Add tags to documents
Feature description
Add an optional tags attribute, perhaps of type list[str], to the Document class and its corresponding schema.
When files are uploaded via the API, tags could then be associated with them:
document_upload = requests.post(
"http://localhost:31476/document",
json={
"name": "my_doc.txt",
"tags": ["my_corpus"]
}
).json()
Then, when creating a chat, one or more tags could be specified either in lieu of or alongside of documents:
chat = requests.post(
"http://localhost:31476/chats",
json={
"name": "My Chat",
"documents": [],
"tags": ["my_corpus"],
"source_storage": "Ragna/DemoSourceStorage",
"assistant": "Ragna/DemoAssistant",
"params": {}
}
).json()
The resultant chat would be over all of the documents specified in the API call as well as any Documents tagged with one or more of the given tags at the time of chat creation.
Value and/or benefit
Introducing tags would be a step in the direction of making Ragna aware of corpora without the need for a heavyweight Corpus class. Having some concept of corpora in Ragna would be beneficial for a couple of reasons:
- Breaks the tight coupling of chat creation and file uploading in the UI. Right now, if you want to upload a set of files and create multiple chats off of them, you need to use the API as hack to set up the UI for end-users ahead of time as per #176 (comment). Being able to upload tagged files and then, later, reference them during chat creation would empower users of the UI.
- Reduces the burden of interacting with the API when dealing with a large number of files. When dealing with thousands of files, certain API calls can be quite laborious and slow. Being able to refer to a tag in such circumstances would make code more readable and decrease the amount of data being transmitted over the wire.
Anything else?
To implement tags, changes would probably be limited to the request bodies of the POST /document and POST /chats endpoints. Since documents with the same tag as the ones in an existing chat could be uploaded after the chat's existence, and these documents would not be in the existing chat, it probably makes little sense to change or expose the tags in any of the API's current response bodies.
The addition of a GET /documents/list endpoint and a GET /tags/list endpoint might also be necessary.