pisa icon indicating copy to clipboard operation
pisa copied to clipboard

PISA: Performant Indexes and Search for Academia

Results 89 pisa issues
Sort by recently updated
recently updated
newest added

**Describe the bug** As the title suggests; I had some issues with segfaults upon mmapping an index. It turned out that the index was being built (but incorrectly) because there...

bug

The following function has issues: https://github.com/pisa-engine/pisa/blob/c6481af140224070aaa5d7ec109bbde396268b8c/include/pisa/bit_vector.hpp#L285 1. We cast an arbitrary byte pointer to int, making it UB. 2. We assume the bit vector has enough bytes to actually dereference....

bug

Because the implementation of block-wise `decode` (and `encode` but this one is not as crucial) is moved out of the header, there is a potential that this will affect performance....

performance
wip
refactoring
build system

Opening another PR because I don't know how easy/tricky this will turn out to be.

wip

Dear my friends, First thank you all for the great project ! This search engine is the most fancy I've found on Github ! In our case, we will have...

enhancement
help wanted
priority:medium

For some weird reason reordering by URL does not work when using https://github.com/pisa-engine/pisa/blob/master/tools/reorder_docids.cpp It does work if we use this external script instead: https://github.com/pisa-engine/pisa/blob/master/script/generate_sorted_docids_mapping.py

bug
followup needed

**Describe the solution you'd like** For CJK languages, like for example Chinese, words are not separated by spaces. So there usually has a need to use a tokenizer to split...

enhancement
help wanted
question

Below is a rough draft of a schema/config/meta file (idk what name fits best here) to organize the files together. The primary goal is to have sane defaults such that...

enhancement
discussion

Currently, term weighting is handled within the `Cursors` classes. In particular, the `ScoredCursor` class stores the query term weight (the weight assigned to a term at query time, usually set...

question
refactoring

**Describe the bug** The docs at https://pisa.readthedocs.io/en/latest/compress_index.html#usage talk about using `create_freq_index` which is what the binary was called in `ds2i`. It looks like it was renamed to `compress_inverted_index` and additional...

bug
help wanted
documentation