cohere-toolkit icon indicating copy to clipboard operation
cohere-toolkit copied to clipboard

backend: Add better support for file content parsing with Python Interpreter

Open tianjing-li opened this issue 4 months ago • 1 comments

I tried various ways to get the Python Interpreter to work with files, including sharing docker volumes between the backend and terrarium services, only to find in https://github.com/cohere-ai/cohere-terrarium?tab=readme-ov-file#sandbox-design that filesystem access is not supported by the Python sandbox.

This workaround instead tries to force instructions to use read file tools and pass content directly

AI Description

This PR introduces several changes to the codebase, primarily focused on file handling and tool configuration.

Summary

The PR makes changes to the file handling system, adding new functions to read different file formats and updating existing ones. It also renames a tool and modifies its description, ensuring consistent naming across the codebase. Additionally, it removes references to Langchain, a tool for building applications with language models, and updates the default model for chat requests.

Changes

  • Makefile: Adds a new target, exec-terrarium, which executes a command in the cohere-toolkit-terrarium-1 container as the root user.
  • docker-compose.yml: Removes the mounting of the src/backend/data directory, which was used to sync uploaded files.
  • src/backend/chat/custom/tool_calls.py: Changes the TIMEOUT variable to TIMEOUT_SECONDS and updates its value to 60. This change affects the timeout value used in the asyncio.wait_for function.
  • src/backend/config/configuration.template.yaml: Renames the tool read_document to read_file.
  • src/backend/config/tools.py: Modifies the description of the ToolName class to clarify the usage of the Python interpreter without internet access and provide guidelines for file handling.
  • src/backend/schemas/chat.py: Removes the user_id field from the BaseChatRequest class, which was previously used to store the conversation under a specific user.
  • src/backend/schemas/file.py: Adds a user_id field to the ConversationFilePublic class, allowing for user-specific file handling.
  • src/backend/services/file.py: Removes the read_excel, read_docx, and read_parquet functions and adds new functions with the same names. These new functions have updated argument names and return types.
  • src/backend/tools/files.py: Renames the NAME attribute of the ReadFileTool class from read_document to read_file.
  • src/backend/tools/python_interpreter.py: Removes the LangchainPythonInterpreterToolInput class and the langchain_call and to_langchain_tool methods.
  • src/interfaces/assistants_web/src/cohere-client/generated/schemas.gen.ts: Updates the default model for chat requests to 'command-r-plus' and adds a user_id field to the $ConversationFilePublic type.
  • src/interfaces/assistants_web/src/cohere-client/generated/types.gen.ts: Adds a user_id field to the ConversationFilePublic type.
  • src/interfaces/assistants_web/src/constants/tools.ts: Renames the TOOL_READ_DOCUMENT_ID constant to TOOL_READ_FILE_ID.
  • src/interfaces/coral_web/src/constants.ts: Renames the TOOL_READ_DOCUMENT_ID constant to TOOL_READ_FILE_ID.

tianjing-li avatar Oct 08 '24 17:10 tianjing-li