Wolf Garbe

Results 61 comments of Wolf Garbe

@ricdtech I can't replicate the issue. I get 0% CPU usage with Docker Desktop on Windows 11. docker stats ``` CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM...

You are right, there is much room for improvement. But unfortunately, the day has only 24 hours. We are currently working on the documentation, but this time on the REST...

> I didn't know how to use advanced queries, such as returning all matching queries. When I used Length=int.Max in the query, the server crashed, and 0 resulted in no...

English, German, and Russian are supported. Japanese, Korean, and Chinese are currently only supported if both documents and queries are pre-tokenized by a tokenizer like https://github.com/messense/jieba-rs in a pre-processing step....

[SeekStorm v0.11.0](https://github.com/SeekStorm/SeekStorm/releases/tag/v0.10.0) has been released. The new tokenizer UnicodeAlphanumericZH implements Chinese word segmentation. ![image](https://github.com/user-attachments/assets/c697b4c4-e27b-472f-b30a-746a87dfed2a)

@inboxsphere What would be your use case? Prefix/substring search? Or something else? For word segmentation a specialized word segmenting algorithm is more efficient than n-gram tokenizing. I'm afraid the index...

@inboxsphere I see. The Chinese tokenizer (UnicodeAlphanumericZH) already handles mixed Chinese/Latin text. We could extend this so that when unknown (not in the Chinese dictionary) and non-Latin words (different Unicode...

I'm planning to do something similar with an S3 object storage compatible index (cloud-native split of storage and compute) https://github.com/SeekStorm/SeekStorm?tab=readme-ov-file#roadmap Seekstorm has both an index (inverted index that stores posting...

> Given SeekStorm's architecture of writing 50MB blocks and the constraint of 100KB max value size in FoundationDB, do you think SeekStorm could be adapted to split its index into...

To add support for distributed key-value stores as a backend for SeekStorm, we need to solve two tasks: 1. **Write/read** the 50MB data blocks (per index level, per index segment)...