elysia icon indicating copy to clipboard operation
elysia copied to clipboard

Using Elysia with pre-chunked Weaviate collections

Open omugwane opened this issue 5 months ago • 1 comments

  • Context: My Weaviate collection contains pre-chunked documents (each chunk stored as a separate object with its own vector). Elysia's docs focus on "chunk-on-demand" - storing full documents and dynamically chunking at query time.

  • Questions:

    1. How does Elysia handle pre-chunked collections? Does it use existing chunks or ignore them?
    2. Can I disable chunk-on-demand to avoid latency and use my pre-chunked data directly?
    3. What's the recommended configuration for pre-chunked collections?
  • Current behavior: Elysia's chunk-on-demand adds latency on first query due to dynamic chunk creation.

  • Desired outcome: Use existing pre-chunked collection directly to eliminate post-chunking latency.

  • Environment: Weaviate + Ollama (gpt-oss-20b) + Elysia

omugwane avatar Oct 08 '25 15:10 omugwane

This should work -

  1. How does Elysia handle pre-chunked collections?

    It doesn't know if a collection is chunked or not, as there's no way to tell for any generic data. The "auto chunker" should only kick in if it detects the average token length of the text in your collection is too long for effective retrieval. This threshold is currently hard-coded to be 400 tokens.

  2. To disable chunk-on-demand

    Hmm, this should probably go in the collection config so you can change this to your liking (or remove it entirely). There are two solutions to removing this functionality in the meantime:

    1. Remove the document display type from the analysis of your data. Go to data->your collection->metadata->edit display mappings-> remove document type. Only documents are chunked in the current version. This will have the side effect of not displaying your data as documents in the app, and will use a table or any other display types possible.
    2. Change the threshold in the code of your Elysia installation, here: https://github.com/weaviate/elysia/blob/25e08ea31d7d222d7db3ede40ecf193bcdb89a33/elysia/tools/retrieval/query.py#L144 It would only be changing the 400 to a much larger value, so not too much effort involved. If you installed elysia via pip install elysia-ai, this would be inside the virtual environment or wherever your python package install location is. If you installed via github or source, then it will be the location of the elysia directory.
  3. Recommended configuration for pre-chunked collections

Probably see above - you'll have to manually configure your documents so that chunk-on-demand isn't being used. However this has raised an important issue. Whether a collection is chunked should be evaluated during the analysis of the data. So I'm going to add this as an open feature: essentially a flag that is toggle-able by the user to disable or enable chunk-on-demand for specific collections.

Let me know if this helps!

dannyjameswilliams avatar Oct 09 '25 13:10 dannyjameswilliams