Michał Siedlaczek

Results 137 comments of Michał Siedlaczek

A quick update: I implemented precomputed scores (only as `float`s right now, without quantization) and store them as 4-bytes each. It takes a lot of space, but the idea is...

Quick tests on Clueweb09B show essentially the same results for BM25 ranked OR as before, while the average drops from `392.932` to `267.576` with precomputed quantized scores of length 1-byte...

Yes, thanks for pointing this out and linking the issue. I'm pretty sure there are some other problems as well, though. I believe it's mostly legacy low-level code. I think...

I think the most troubling thing is that the address sanitizer actually found problems when replaced with `memcpy`. Not sure why it's not catching it with the cast, but it...

@JMMackenzie I did a very quick test, I added this assert: ```cpp assert(pos / 8 < (m_bits.size() * 8) - 7); ``` And it fails. So if I understand this...

> So: assert(pos / 8 < (m_bits.size() * 8) - 7); is saying: "Find the byte corresponding to pos and make sure it is at least 7 bytes before the...

@mpetri right, but it seems like regardless of that, we have a situation in which we call a function that is supposed to return at least 56 bits, but there...

@ot Thanks a lot for your comments, the explanation really helps, we'll make sure to document it. Just to clarify: in the encoding algorithms where this is used, when the...

> We can probably also check out how bit_cast in C++20 is implemented. Somewhat anticlimactically, it seems to be implemented with `__builtin_bit_cast` by both gcc and clang. Which means we...

I'll have a look sometime soon.