Dominik Weckmüller

Results 43 issues of Dominik Weckmüller

### Problem I've been working on legal documents lately and indexing 300k documents. Everything is going perfectly fine with normal-sized docs (dozens of pages). However, when documents become very large...

I am using the Python bindings and noticed that text-splitter is running on one core only. I think it would be great to allow for an option to use all...

enhancement

Hey Ben, I would love to use a wasm version of text-splitter in the web application https://github.com/do-me/SemanticFinder. Currently it only supports chars, words, sentences, regex and tokens but all of...

enhancement

Hey all, are thery any plans yet to make use of fastembed in the Web UI? It would be so nice, to have an out-of-the box UI for querying your...

The Web-UI (`http://localhost:6333/dashboard#/collections//visualize`) is very useful for quick exploration of a collection. However, the UX might need some minor improvements. I guess that the most common use case is free...

### Model description [jinaai/jina-clip-v1](https://huggingface.co/jinaai/jina-clip-v1/tree/main/onnx) ### Prerequisites - [X] The model is supported in Transformers (i.e., listed [here](https://huggingface.co/docs/transformers/index#supported-models-and-frameworks)) - [X] The model can be exported to ONNX with Optimum (i.e., listed...

new model

As the title states, the aggregation function is entirely ignored and it does not make any difference whether you insert np.mean, np.min or np.max. Using `plotly==5.22.0` Example: 1. Generate data...

bug
sev-4

As the title states, see the web page here: https://do-me.github.io/kceo_glossary/term%201/ Did I mess up some settings maybe? https://github.com/do-me/kceo_glossary/blob/main/mkdocs.yml Screenshots: Logs seem fine: ``` INFO - git-committers plugin ENABLED WARNING -...

Some embedding models require prefixes for retrieval like: - `search_query: ` for https://huggingface.co/nomic-ai/nomic-embed-text-v1.5 Else, retrieval performance degrades. Should be easy to add just before inferencing. Doesn't need heavy modifications in...

I am trying to chunk a huge document but it runs forever. Did I miss something in my code? [File here](https://drive.google.com/file/d/1Xnp5jJhjIWNA6R5u9w96L9WO_Hb61Jmh/view?usp=sharing) ```python import semchunk import pandas as pd df =...