omi icon indicating copy to clipboard operation
omi copied to clipboard

Chat needs an ability to attach images/files

Open kodjima33 opened this issue 11 months ago • 22 comments

Implementation

(Thinh's note)

Backend (2):

  1. new api route POST /files to upload files, input: file, logic: create new docs users>uid>files{id, name, thumbnail, mime_type, openai_file_id} direct upload to openai(/files), output: {id, name, thumbnail,}
  2. POST /messages with new body param file_ids: [str], logic: chat with files use openai https://platform.openai.com/docs/api-reference/threads , use the best openai model o1 (or gpt-4o if o1 is not work with file yet)

App (4):

  1. Chat > Message box: add options to attach photos, take photo, attach files.
  2. Chat > Upload photo,files to /files, before submitting messages
  3. Chat > Submit message to /messages with the new body field file_ids
  4. Chat > Message list: render the message with attachment(photo, file)

Be notice:

the current chat feature: ensure the new chat works seamlessly with the current chat feature.

keep the UI simple: we can use openai app as the standard product.

thread and end-thread option: the best implementation is we could detect if users are asking a question that needs the context from file(what file) or not. btw, if it's too complicated at this time, so let's go with either:

  1. having an option to end thread
  2. or just use Clear chat to force end thread

Maybe useful

  • Chat with files example: https://github.com/BasedHardware/omi/issues/1617#issuecomment-2567425580

kodjima33 avatar Dec 31 '24 03:12 kodjima33

#1573

beastoin avatar Jan 02 '25 02:01 beastoin

Implementation

Backend (2):

  1. new api route POST /files to upload files, input: file, logic: create new docs users>uid>files{id, name, thumbnail, mime_type, openai_file_id} direct upload to openai(/files), output: {id, name, thumbnail,}
  2. POST /messages with new body param file_ids: [str], logic: chat with files use openai https://platform.openai.com/docs/api-reference/threads , use the best openai model o1 (or gpt-4o if o1 is not work with file yet)

App (4):

  1. Chat > Message box: add options to attach photos, take photo, attach files.
  2. Chat > Upload photo,files to /files, before submitting messages
  3. Chat > Submit message to /messages with the new body field file_ids
  4. Chat > Message list: render the message with attachment(photo, file)

Be notice:

the current chat feature: ensure the new chat works seamlessly with the current chat feature.

keep the UI simple: we can use openai app as the standard product.

thread and end-thread option: the best implementation is we could detect if users are asking a question that needs the context from file(what file) or not. btw, if it's too complicated at this time, so let's go with either:

  1. having an option to end thread
  2. or just use Clear chat to force end thread

beastoin avatar Jan 02 '25 08:01 beastoin

import os
from dotenv import load_dotenv
import openai

class FileChat:
    def __init__(self):
        load_dotenv()
        openai.api_key = os.getenv("OPENAI_API_KEY")
        self.thread = None
        self.file_id = None
        self.assistant = None

    def load_document(self, file_path):
        """Upload a document to OpenAI and create a thread"""
        # Upload the file to OpenAI
        with open(file_path, 'rb') as file:
            response = openai.files.create(
                file=file,
                purpose='assistants'
            )
            self.file_id = response.id

        # Create an assistant with file search capability
        self.assistant = openai.beta.assistants.create(
            name="File Reader",
            instructions="You are a helpful assistant that answers questions about the provided file. Use the file_search tool to search the file contents when needed.",
            model="gpt-4o",
            tools=[{"type": "file_search"}]
        )
        
        # Create a thread and attach the file
        self.thread = openai.beta.threads.create()
        openai.beta.threads.messages.create(
            thread_id=self.thread.id,
            role="user",
            content="Please help me answer questions about the attached file.",
            attachments=[{
                "file_id": self.file_id,
                "tools": [{"type": "file_search"}]
            }]
        )

    def ask(self, question):
        """Ask a question about the loaded document"""
        if not self.thread or not self.file_id:
            return "Please load a document first using load_document(file_path)"

        # Add the question to the thread
        openai.beta.threads.messages.create(
            thread_id=self.thread.id,
            role="user",
            content=question
        )

        # Create a run with the assistant
        run = openai.beta.threads.runs.create(
            thread_id=self.thread.id,
            assistant_id=self.assistant.id
        )

        # Wait for the response
        while True:
            run_status = openai.beta.threads.runs.retrieve(
                thread_id=self.thread.id,
                run_id=run.id
            )
            if run_status.status == 'completed':
                break

        # Get the messages
        messages = openai.beta.threads.messages.list(
            thread_id=self.thread.id
        )

        # Return the latest assistant response
        return messages.data[0].content[0].text.value

    def cleanup(self):
        """Clean up resources"""
        if self.file_id:
            # Delete the file from OpenAI
            openai.files.delete(self.file_id)
            self.file_id = None
        if self.assistant:
            # Delete the assistant
            openai.beta.assistants.delete(self.assistant.id)
            self.assistant = None
        self.thread = None

def main():
    # Initialize the chat system
    chat = FileChat()

    print("Welcome to File Chat!")
    print("First, please provide the path to your text file.")

    try:
        while True:
            file_path = input("\nEnter file path (or 'quit' to exit): ")

            if file_path.lower() == 'quit':
                break

            try:
                chat.load_document(file_path)
                print(f"\nFile loaded successfully! You can now ask questions about {file_path}")

                while True:
                    question = input("\nAsk a question (or 'new' for new file, 'quit' to exit): ")

                    if question.lower() == 'quit':
                        chat.cleanup()
                        return
                    elif question.lower() == 'new':
                        chat.cleanup()
                        break

                    answer = chat.ask(question)
                    print("\nAnswer:", answer)

            except Exception as e:
                print(f"Error: {str(e)}")
    finally:
        # Ensure cleanup happens even if there's an error
        chat.cleanup()

if __name__ == "__main__":
    main()

beastoin avatar Jan 02 '25 08:01 beastoin

@mdmohsin7 man, pls read this ticket's description and feel free to ask me anything. if everything is clear and you're excited about this feature, drop your UI/UX proposal then go ahead.

@nquang29 said that he is also excited with this ticket so you can ask him if he can help on backend side or not / Quang's Discord @windtran_

beastoin avatar Jan 02 '25 08:01 beastoin

Got it! As mentioned we can simply follow the UX of ChatGPT or even iMessage

Screenshot_20250102-182331~2

Screenshot_20250102-182350~2

Chat > Upload photo,files to /files, before submitting messages Chat > Submit message to /messages with the new body field file_ids

What if we upload the file right after the user selects it? Similar to how the ChatGPT app does it

So if I understand correctly, @nquang29 will be working on the backend and I'll have to make the app side changes?

@beastoin

mdmohsin7 avatar Jan 02 '25 13:01 mdmohsin7

yes i mean uploading right after selecting the image / file.

use our figma and draft the design pls sir

you can do both, or just ask Quang to see if he could help so that we can speed up the progress.

@mdmohsin7

beastoin avatar Jan 02 '25 13:01 beastoin

The designs in our figma are very old and are not the ones that are being followed currently. I'll quickly code the design without the functionality and will share the image with you

Alright I'll message Quang on discord

@beastoin

mdmohsin7 avatar Jan 02 '25 13:01 mdmohsin7

Progress:

IMG_AF3DDE7D6CF4-1

IMG_0984249CA139-1

mdmohsin7 avatar Jan 02 '25 15:01 mdmohsin7

Are we going to allow multiple file uploads?

mdmohsin7 avatar Jan 02 '25 18:01 mdmohsin7

multiple file uploads - yes

at the time you use figma, your mind focuses completely on design (ui/ux) - not code. that's the reason why if you want to create great ux, you need to draft your ideas somewhere - away from your code editor.

@mdmohsin7

beastoin avatar Jan 03 '25 01:01 beastoin

What is the max limit on the number of files? And also any max limit on the file size?

Since we don't have the current UI designs in Figma, it would have taken more time to design the new UI so I just went with code itself for now. Pls check the video in #1629, that should give you an idea of how the UI will look. The app side part is almost done (will have to modify it a bit to support multiple files), just need to connect to the backend

@beastoin

mdmohsin7 avatar Jan 03 '25 05:01 mdmohsin7

just follow what chatgpt did

@mdmohsin7 ^

beastoin avatar Jan 04 '25 03:01 beastoin

just follow what chatgpt did

ChatGPT only allows 3-4 files on free plan

I've asked Quang on discord for help with backend, he's interested it seems and waiting for him response

mdmohsin7 avatar Jan 07 '25 13:01 mdmohsin7

@mdmohsin7 could you share the latest demo video here so that we can get feedback easier ?

@mdmohsin7 @nquang29 we should finish the feature in the next 3 days, so i will be making some slightly pushes. be ready pls :)

beastoin avatar Feb 12 '25 06:02 beastoin

@beastoin latest demo video (pls excuse my slow internet)

https://github.com/user-attachments/assets/8349752a-6825-402a-a322-37a0a74e4167

mdmohsin7 avatar Feb 12 '25 08:02 mdmohsin7

Deploy plan

  • [x] Create new gcp bucket for chat thumbnails, public read <x>_chat_files
  • [x] Set new env var BUCKET_CHAT_FILES=<x>_chat_files to backend
  • [x] Create Firestore index on messages collection; fields: chat_session_id Ascending, deleted Ascending, plugin_id Ascending, created_at Descending, __name__ / creation link
  • [x] Deploy backend / https://github.com/BasedHardware/omi/actions/runs/13353001675
  • [x] Deploy app https://github.com/BasedHardware/omi/releases/tag/v1.0.54%2B223-mobile-cm

beastoin avatar Feb 16 '25 04:02 beastoin

product change logs

  1. the feature is ready on Testflight / Internal Test

please keep monitoring and improving the feature closely over the next 3 weeks

congratulation @mdmohsin7 @nquang29 @kodjima33 🚀

beastoin avatar Feb 16 '25 08:02 beastoin

Doesn't work @mdmohsin7

@beastoin poor review

Image

kodjima33 avatar Feb 18 '25 02:02 kodjima33

Doesn't work @mdmohsin7

@beastoin poor review

https://github.com/user-attachments/assets/eb929b8e-d8f0-4e4d-8be7-ff694d9c43b7

Didn't do the backend changes, will test once again if the frontend is missing something from its side

mdmohsin7 avatar Feb 18 '25 04:02 mdmohsin7

@nquang29 please push the fixes 🌚

beastoin avatar Feb 23 '25 02:02 beastoin

product change logs

  1. the fixes are ready on testflight/internal test 🚀

@nquang29 pls check it / @mdmohsin7 @kodjima33 fyi ~

beastoin avatar Feb 24 '25 08:02 beastoin

product changes logs

  1. reverted ↘️

cause, the bad fixes #1866 :

It will navigate the users to the file chat anytime they ask: "What's this?"

@nquang29 pls fix it :)

beastoin avatar Feb 24 '25 09:02 beastoin