continue
                                
                                 continue copied to clipboard
                                
                                    continue copied to clipboard
                            
                            
                            
                        Large typescript files are silently not indexed
Before submitting your bug report
- [X] I believe this is a bug. I'll try to join the Continue Discord for questions
- [X] I'm not able to find an open issue that reports the same bug
- [X] I've seen the troubleshooting guide on the Continue Docs
Relevant environment info
- OS: windows 10
- Continue: v0.9.199-vscode
- IDE: VSCode 1.92.2
- Model: Any besides llama3
- config.json:
  
{
  "models": [
    {
      "title": "Llama 3",
      "provider": "ollama",
      "model": "llama3"
    },
    {
      "title": "Ollama",
      "provider": "ollama",
      "model": "AUTODETECT"
    },
    {
      "title": "GPT4",
      "model": "gpt-4",
      "apiBase": "x",
      "apiKey": "x",
      "systemMessage": "You are an expert software developer. You give helpful and concise responses.",
      "useLegacyCompletionsEndpoint": false,
      "completionOptions": {
        "maxTokens": 4096,
        "temperature": 0.5,
        "topP": 0.8
      },
      "contextLength": 128000,
      "provider": "openai"
    },
    {
      "title": "Code Llama",
      "model": "phind-codellama-34b-v2",
      "apiBase": "x",
      "apiKey": "x",
      "useLegacyCompletionsEndpoint": false,
      "completionOptions": {
        "maxTokens": 4096,
        "temperature": 0.5,
        "topP": 0.8
      },
      "contextLength": 128000,
      "provider": "openai"
    },
    {
      "title": "Llama3-70b",
      "model": "llama3-70b",
      "apiBase": "x",
      "apiKey": "x",
      "useLegacyCompletionsEndpoint": false,
      "contextLength": 128000,
      "provider": "openai"
    },
    {
      "title": "Llama3-8b",
      "model": "llama3-8b",
      "apiBase": "x",
      "apiKey": "x",
      "useLegacyCompletionsEndpoint": false,
      "completionOptions": {
        "maxTokens": 2048,
        "temperature": 0.5,
        "topP": 0.8,
        "stop": [
          "<|start_header_id|>",
          "<|end_header_id|>",
          "<|eot_id|>"
        ]
      },
      "contextLength": 128000,
      "provider": "openai"
    }
  ],
  "customCommands": [
    {
      "name": "test",
      "prompt": "{{{ input }}}\n\nWrite a comprehensive set of unit tests for the selected code. It should setup, run tests that check for correctness including important edge cases, and teardown. Ensure that the tests are complete and sophisticated. Give the tests just as chat output, don't edit any file.",
      "description": "Write unit tests for highlighted code"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Starcoder 3b",
    "provider": "ollama",
    "model": "starcoder2:3b"
  },
  "embeddingsProvider": {
    "provider": "ollama",
    "model": "BAAI/bge-small-en-v1.5:latest",
    "apiBase": "http://localhost:11434"
  },
  "allowAnonymousTelemetry": false,
  "docs": []
}
Description
Larger typescript files produce a chunk that is larger than the max chunk size. CodeChunker passes them along and they are rejected. getSmartCollapsedChunks / tree-sitter doesn't seem to handle things properly. One workaround is to fallback to basicChunker if codeChunker produces an oversized chunk. This change at least allowed me to retrieve context from the larger files and greatly improved the context provided. https://github.com/breynolds3/continue/commit/230dbd84967d8f42ee43eaa9d6cd1989cf0d0a64
To reproduce
- Copy the following file into a folder https://github.com/microsoft/vscode/blob/main/src/vs/workbench/browser/workbench.contribution.ts
- Open the folder in VSCode
- Index the project
- Close VSCode
- Open ~/.continue/index.sqlite with sqlite3
- Run select distinct path from chunks;
- Observe the file is not present in the table.
Log output
No response