continue Large typescript files are silently not indexed

Large typescript files are silently not indexed

Open breynolds3 opened this issue 1 year ago • 0 comments

Before submitting your bug report

[X] I believe this is a bug. I'll try to join the Continue Discord for questions
[X] I'm not able to find an open issue that reports the same bug
[X] I've seen the troubleshooting guide on the Continue Docs

Relevant environment info

- OS: windows 10
- Continue: v0.9.199-vscode
- IDE: VSCode 1.92.2
- Model: Any besides llama3
- config.json:
  
{
  "models": [
    {
      "title": "Llama 3",
      "provider": "ollama",
      "model": "llama3"
    },
    {
      "title": "Ollama",
      "provider": "ollama",
      "model": "AUTODETECT"
    },
    {
      "title": "GPT4",
      "model": "gpt-4",
      "apiBase": "x",
      "apiKey": "x",
      "systemMessage": "You are an expert software developer. You give helpful and concise responses.",
      "useLegacyCompletionsEndpoint": false,
      "completionOptions": {
        "maxTokens": 4096,
        "temperature": 0.5,
        "topP": 0.8
      },
      "contextLength": 128000,
      "provider": "openai"
    },
    {
      "title": "Code Llama",
      "model": "phind-codellama-34b-v2",
      "apiBase": "x",
      "apiKey": "x",
      "useLegacyCompletionsEndpoint": false,
      "completionOptions": {
        "maxTokens": 4096,
        "temperature": 0.5,
        "topP": 0.8
      },
      "contextLength": 128000,
      "provider": "openai"
    },
    {
      "title": "Llama3-70b",
      "model": "llama3-70b",
      "apiBase": "x",
      "apiKey": "x",
      "useLegacyCompletionsEndpoint": false,
      "contextLength": 128000,
      "provider": "openai"
    },
    {
      "title": "Llama3-8b",
      "model": "llama3-8b",
      "apiBase": "x",
      "apiKey": "x",
      "useLegacyCompletionsEndpoint": false,
      "completionOptions": {
        "maxTokens": 2048,
        "temperature": 0.5,
        "topP": 0.8,
        "stop": [
          "<|start_header_id|>",
          "<|end_header_id|>",
          "<|eot_id|>"
        ]
      },
      "contextLength": 128000,
      "provider": "openai"
    }
  ],
  "customCommands": [
    {
      "name": "test",
      "prompt": "{{{ input }}}\n\nWrite a comprehensive set of unit tests for the selected code. It should setup, run tests that check for correctness including important edge cases, and teardown. Ensure that the tests are complete and sophisticated. Give the tests just as chat output, don't edit any file.",
      "description": "Write unit tests for highlighted code"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Starcoder 3b",
    "provider": "ollama",
    "model": "starcoder2:3b"
  },
  "embeddingsProvider": {
    "provider": "ollama",
    "model": "BAAI/bge-small-en-v1.5:latest",
    "apiBase": "http://localhost:11434"
  },
  "allowAnonymousTelemetry": false,
  "docs": []
}

Description

Larger typescript files produce a chunk that is larger than the max chunk size. CodeChunker passes them along and they are rejected. getSmartCollapsedChunks / tree-sitter doesn't seem to handle things properly. One workaround is to fallback to basicChunker if codeChunker produces an oversized chunk. This change at least allowed me to retrieve context from the larger files and greatly improved the context provided. https://github.com/breynolds3/continue/commit/230dbd84967d8f42ee43eaa9d6cd1989cf0d0a64

To reproduce

Copy the following file into a folder https://github.com/microsoft/vscode/blob/main/src/vs/workbench/browser/workbench.contribution.ts
Open the folder in VSCode
Index the project
Close VSCode
Open ~/.continue/index.sqlite with sqlite3
Run select distinct path from chunks;
Observe the file is not present in the table.

Log output

No response

Aug 25 '24 20:08 breynolds3

continue continue copied to clipboard

Large typescript files are silently not indexed

Before submitting your bug report

Relevant environment info

Description

To reproduce

Log output

continue
continue copied to clipboard