pisa
pisa copied to clipboard
PISA: Performant Indexes and Search for Academia
Hi everyone! I bring up for discussion some possible improvements that could be introduced in the `block codecs`. ### Interpolative Starting here: https://github.com/pisa-engine/pisa/blob/35dd03b73506d2089686a4a211e38898671b425c/include/pisa/codec/block_codecs.hpp#L151 Why not just use `in` instead of...
Given recent feedback from HN, we should look at improving how we explain PISA, and offer some benchmarks to common systems like Lucene and Tantivy (perhaps). We also should document...
**Describe the bug** Ingesting plaintext records via stdin seems very slow (`12`MiB/sec) even though `60` worker threads are used. I suspect this has to do with https://en.cppreference.com/w/cpp/io/ios_base/sync_with_stdio Before I disabled...
Somehow, we ended up with two quantization options for `queries`. First, we have `--quantized`, and then we have `--scorer quantized`. I think the easier way would be to remove `--quantized`...
Right now, the scorer is hard-coded to `bm25`.
I propose to limit Travis build for PRs to few essential checks. First, we can compile headers since it is quite fast so we make sure all the included are...
We have many places where we just assume that a file exists, say, when we create a `mio::mmap_source`, which then fails with simply `No such file exists` or something similar...
When PISA is used as a library, the exported path for includes is wrong; you need to include headers like so: ```cpp #include ``` #### Should be ```cpp #include ```
**Describe the bug** Shouldn't we do proper tokenization in `parse_plaintext_content` too? https://github.com/pisa-engine/pisa/blob/4a739b2ec50d2faa1e3c57336337e4fe219e09ec/include/pisa/forward_index_builder.hpp#L60-L66