Dominik Weckmüller

Results 108 comments of Dominik Weckmüller

# Batch Processing For anyone who'd like work on improving performance, batch processing is the way to go. Find my personal tests & demos here: https://github.com/do-me/SemanticFinder/issues/11#issuecomment-2343983733. I do not have...

Update: it ran all night without success. Seemed to be still running this morning, so I aborted.

It's not about the parquet file, I just exported it as parquet for you for convenience. Yes, also the other texts are working fine. Seems really to be about the...

I see, thanks for looking into this! So from what you describe, this could happen on other semantic levels as well. The logic to stream the already identified chunks would...

Awesome @benbrandt, thank you very much for your efforts! I'll need to run some bigger processing tasks in the next couple of weeks so I'll come back here by then...

Thanks for sharing your ideas. I see, that's what I expected with tokenization and that's totally understandable. I think it's a good decision. By the way, here is the (more...

Hi Ben! First of all a big thank you for this package. It is taking away the pain from poorly performing Python-based frameworks and exactly what I was looking for....

The URL leads me to an empty description somehow :D ![image](https://github.com/benbrandt/text-splitter/assets/47481567/13a68728-37f2-449c-ac06-3af30c3cb7f7) However, thanks for clarifying, that makes a lot of sense. So if I get you right, you would say...

Sure, sounds really exciting! :) Looking forward to your future developments!

Yes absolutely! Token-based chunking for my use case is an absolute overkill. However if you'd still want to offer a way to include it for some reason, trasformers.js offers a...