Dominik Weckmüller issues

Results 43 issues of


                                            Dominik Weckmüller

Heuristics for very large documents

### Problem I've been working on legal documents lately and indexing 300k documents. Everything is going perfectly fine with normal-sized docs (dozens of pages). However, when documents become very large...

Performance: use all available CPU cores

I am using the Python bindings and noticed that text-splitter is running on one core only. I think it would be great to allow for an option to use all...

enhancement

Release wasm version

Hey Ben, I would love to use a wasm version of text-splitter in the web application https://github.com/do-me/SemanticFinder. Currently it only supports chars, words, sentences, regex and tokens but all of...

enhancement

Fastembed integration

Hey all, are thery any plans yet to make use of fastembed in the Web UI? It would be so nice, to have an out-of-the box UI for querying your...

[Web-UI] UX visualization improvements

The Web-UI (`http://localhost:6333/dashboard#/collections//visualize`) is very useful for quick exploration of a collection. However, the UX might need some minor improvements. I guess that the most common use case is free...

jinaai/jina-clip-v1: support for model names with prefixes

### Model description [jinaai/jina-clip-v1](https://huggingface.co/jinaai/jina-clip-v1/tree/main/onnx) ### Prerequisites - [X] The model is supported in Transformers (i.e., listed [here](https://huggingface.co/docs/transformers/index#supported-models-and-frameworks)) - [X] The model can be exported to ONNX with Optimum (i.e., listed...

new model

figure_factory create_hexbin_mapbox ignored agg_func

As the title states, the aggregation function is entirely ignored and it does not make any difference whether you insert np.mean, np.min or np.max. Using `plotly==5.22.0` Example: 1. Generate data...

bug

sev-4

Authors just displayed on one page

As the title states, see the web page here: https://do-me.github.io/kceo_glossary/term%201/ Did I mess up some settings maybe? https://github.com/do-me/kceo_glossary/blob/main/mkdocs.yml Screenshots: Logs seem fine: ``` INFO - git-committers plugin ENABLED WARNING -...

Allow option for search prefixes

Some embedding models require prefixes for retrieval like: - `search_query: ` for https://huggingface.co/nomic-ai/nomic-embed-text-v1.5 Else, retrieval performance degrades. Should be easy to add just before inferencing. Doesn't need heavy modifications in...

Problem with very large files

I am trying to chunk a huge document but it runs forever. Did I miss something in my code? [File here](https://drive.google.com/file/d/1Xnp5jJhjIWNA6R5u9w96L9WO_Hb61Jmh/view?usp=sharing) ```python import semchunk import pandas as pd df =...