Michał Siedlaczek

Results 60 issues of Michał Siedlaczek

@amallia @JMMackenzie Let's discuss. | Current Name | Proposal | | --- | --- | | `compute_intersection` | `intersect` | | `create_freq_index` | `compress` | | `create_wand_data` | | |...

wip
discussion
refactoring

Logic error | Dereference of null pointer | include/pisa/bit_vector.hpp | skip | 426 | 29 | View Report | Report Bug | Open File -- | -- | -- |...

bug

**Describe the bug** `clang-tidy` found a potential null pointer dereference in `bit_vector` code. **To Reproduce** Steps to reproduce the behavior: 1. From `build` directory, run `cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON ..` 3. Run...

bug

In logs keeping track of, say, number of documents processed, it would be very beneficial to print big numbers with separators, e.g., `10,000,000` instead of `10000000`.

enhancement
good first issue
priority:low

https://pisa.readthedocs.io/en/latest/getting_started.html#building-the-code the wording is unclear here, when we say that it was *tested* on different compilers. We should state clearly about c++17 and minimum versions that support that.

documentation

Along with the index format (#143) we need to clean up and stabilize our programmatic API. I vote to do it via function signatures and concepts for types. Although concepts...

wip
discussion
refactoring

```cpp template concept bool IndexLike = CursorLike && requires(T index, TermId termid) { { index.cursor(termid) } -> C; { index.num_documents() } -> std::size_t; { index.num_terms() } -> std::size_t; }; ```

wip
discussion
refactoring

```cpp template concept bool CursorLike = requires(T cursor, DocId docid, Position pos) { { cursor.reset() } -> void; { cursor.next() } -> void; { cursor.next_geq(docid) } -> void; { cursor.move(pos)...

wip
discussion
refactoring

In the spirit of #249, I want to start a discussion regarding our block (for now) codec interface. Here's what it looks like now if represented by C++20 concepts TS:...

discussion

Right now, I think everything resides in main memory for the entire run, and you need a lot memory to compress the big collections like Clueweb. But it should be...

performance
refactoring