continue icon indicating copy to clipboard operation
continue copied to clipboard

Error throw when using @codebase in the prompt: Cannot read properties of undefined (reading 'sort')

Open CallMeLaNN opened this issue 1 year ago • 7 comments

Before submitting your bug report

Relevant environment info

- OS: MacOS 12.6
- Continue: v0.8.43
- IDE: VSCode 1.91.1 (Universal)
- Model: Any
- config.json:
  
  {
    "models": [...],
    ...,
    "embeddingsProvider": {
      ... // Any provider I tried, transformer.js, ollama and openai (voyage)
    },
    "reranker": {
      "name": "voyage",
      "params": {
        "model": "rerank-1",
        "contextLength": 8000,
        "apiKey": "abcd"
      }
    },
    "contextProviders": [
      ...,
      {
        "name": "codebase",
        "params": {
          "nRetrieve": 25,
          "nFinal": 5,
          "useReranking": true
        }
      },
    ],
  }

Description

I'm evaluating different embeddings models and config. I don't have embeddings issue before but since yesterday, I got this error. I try to change different settings, try different provider, use back the old one but it doesn't work anymore.

I don't provide the embeddings config above because I tried each one of the provider but still doesn't work. It was working before. Let me know if you still need one example of mine.

To reproduce

  1. Update embeddings config
  2. In the Continue chat, click green dot besides model dropdown list to force reindex.
  3. Clear the Continue log in VSCode OUTPUT tab.
  4. Start Continue chat with "@codebase "
  5. VSCode notification and DevTool Console produce the error above
  6. Observe the Continue log. It doesn't add the relevant context to LLM
  7. LLM give general answer as it don't know about the code
  8. Try restart the VSCode and repeat, try different embeddings model, provider and config. Same result.

Log output

==========================================================================
==========================================================================
Settings:
contextLength: 4096
model: meta-llama/llama-3.1-8b-instruct:free
maxTokens: 1024
log: undefined

############################################

<user>
where is the relevant code and files that set the theming for this website?

==========================================================================
==========================================================================
Completion:

Unfortunately, I don't have the capability to browse the internet or access any specific website's code or files. However, I can give you some general information about where to look for theming-related code on a website.

Most modern websites use a combination of HTML, CSS, and JavaScript to display their content and apply their theme. Here are some places you might look to find the relevant code and files:

...

Keep in mind that the location and naming conventions may vary depending on the website's architecture and technology stack. To find the relevant code and files, you can use a combination of search engines, developer tools, and DNS reconnaissance techniques.

EDIT: Before this thing happened, I know my embedding was working fine, I saw the log produce a relevant context of my codebase

CallMeLaNN avatar Jul 27 '24 07:07 CallMeLaNN

image

This is the notification.

I tried to remove ~/.continue/index in case if it get messed up but using codebase retrieval with new index cache, the error still appear.

I tried in a few projects web and python in case related to project, still not working. However it is fine in a new project with 1 file.


So maybe there's a cache related to project, I'm not sure anywhere else other than ~/.continue/index/lancedb/{project path}... but I already clear it for a fresh index.

Hopefully someone can tell me anywhere else I can clear the cache.

CallMeLaNN avatar Jul 27 '24 12:07 CallMeLaNN

What happens if you remove the re-ranker definition from your config.json file?

fry69 avatar Jul 27 '24 12:07 fry69

What happens if you remove the re-ranker definition from your config.json file?

You are right. I removed and it is working fine in my project.

Anyway to get the log or check further? So far the embedding result contain some irrelevant info, some LLM can't answer it correctly.

CallMeLaNN avatar Jul 27 '24 12:07 CallMeLaNN

See here for logs etc -> https://docs.continue.dev/troubleshooting#llm-prompt-logs

fry69 avatar Jul 27 '24 13:07 fry69

That log doesn't include reranker related. Continue only log i/o for LLM. It doesn't log i/o for embeddings and reranker models before sent to LLM like my log above.

I mean I can't use reranker for my projects and not sure how to check further without any log.

... but wait, I get it now, the reranker return unexpected result. I wait for a while if I hit the rate limit.

It turn out that I hit the TPM rate limit for embeddings size times codebase nRetrieve params. I can avoid the error by adjusting the nRetrieve. I'm not sure if I can adjust chunk size.

So maybe an improvement can be made to catch and log unexpected response. I never expect it was due to the invalid response or rate limited.

CallMeLaNN avatar Jul 27 '24 16:07 CallMeLaNN

I'm running into this same issue. It's because the core\context\retrieval\pipelines\RerankerRetrievalPipeline.ts is sending too much data to the reranker in one batch:

"Request to model 'rerank-1' failed. The max allowed tokens per submitted batch is 100000. Your batch has 145736 tokens after truncation. Please lower the number of tokens in the batch."

This is with Voyage's renrank-1 model, but not with rerank-lite-1 since it has a higher token limit:

"What is the total number of tokens for the rerankers? We define the total number of tokens as the “(number of query tokens × the number of documents) + sum of the number of tokens in all documents". This cannot exceed 300K for rerank-lite-1 and 100K for rerank-1. However, if you are latency-sensitive, we recommend no more than 200K total tokens per request for rerank-lite-1."

RerankerRetrievalPipeline is sending quite a bit of data

    retrievalResults.push(
      ...recentlyEditedFilesChunks,
      ...ftsChunks,
      ...embeddingsChunks,
      ...repoMapChunks,
    );

So it's not even the embeddings themselves that are filling up the batch. I guess we need some way to limit the search results we send to the reranker

FallDownTheSystem avatar Sep 21 '24 17:09 FallDownTheSystem

I am getting the same issue - do I understand correctly that the solution is removing the reranker from the config.json file? I have atm:

  "reranker": {
    "name": "voyage",
    "params": {
      "apiKey": "pa-xxx"
    }
  },

pmatos avatar Sep 27 '24 09:09 pmatos

Hi all, thanks for all the details here. Sharing another example of a user running into this issue: https://discord.com/channels/1108621136150929458/1108621136830398496/1296138969288937522

Will circle back to this thread when we address the issue.

Patrick-Erichsen avatar Oct 17 '24 17:10 Patrick-Erichsen

This issue hasn't been updated in 90 days and will be closed after an additional 10 days without activity. If it's still important, please leave a comment and share any new information that would help us address the issue.

github-actions[bot] avatar Mar 03 '25 04:03 github-actions[bot]

SAME ISSUE...

Legend-parth avatar Mar 08 '25 11:03 Legend-parth

I also have this issue, but with a slightly different message: Image

wilstdu avatar Jul 09 '25 05:07 wilstdu

This issue hasn't been updated in 90 days and will be closed after an additional 10 days without activity. If it's still important, please leave a comment and share any new information that would help us address the issue.

github-actions[bot] avatar Oct 08 '25 02:10 github-actions[bot]

This issue was closed because it wasn't updated for 10 days after being marked stale. If it's still important, please reopen + comment and we'll gladly take another look!

github-actions[bot] avatar Oct 19 '25 02:10 github-actions[bot]