pisa
pisa copied to clipboard
Listing options on cmdline
I also got rid of the index types macro replacing it with some template metaprogramming.
Codecov Report
Merging #439 (9969a8c) into master (a28100d) will increase coverage by
0.00%. The diff coverage is100.00%.
@@ Coverage Diff @@
## master #439 +/- ##
=======================================
Coverage 93.26% 93.26%
=======================================
Files 92 93 +1
Lines 4926 4941 +15
=======================================
+ Hits 4594 4608 +14
- Misses 332 333 +1
| Impacted Files | Coverage Δ | |
|---|---|---|
| include/pisa/mappable/mappable_vector.hpp | 100.00% <ø> (ø) |
|
| include/pisa/index_types.hpp | 100.00% <100.00%> (ø) |
|
| include/pisa/wand_data.hpp | 85.55% <100.00%> (ø) |
|
| include/pisa/wand_data_raw.hpp | 100.00% <100.00%> (ø) |
|
| include/pisa/bit_vector.hpp | 97.54% <0.00%> (-0.41%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update a28100d...9969a8c. Read the comment docs.
Overview
I refactor this to simplify the usage. There is a single public way of retrieving a "type" object for an index based on the encoding:
auto index_type = IndexType::resolve("pef"); // or IndexType::resolve_profiling for the indexes with profiling
This is an object that internally holds a variant of index type markers, meaning objects that have the type of the index encoded as a typedef. This is necessary, because there are some situations where we need to know the type before creating the index itself, in particular when compressing an index.
To run some code with the index type, we need to call:
index_type.execute([](auto&& index_type) {
using Index = typename decltype(index_type)::type;
});
However, as most commonly we want to load an index and do something with it, there's another method that loads the index and runs some code on it:
index_type.load_and_execute(index_path, [](auto&& index) {
// decltype(index) is something like `block_simdbp_index`
});
which is equivalent to:
index_type.execute([&index_path](auto&& index_type) {
using Index = typename decltype(index_type)::type;
auto index = Index(MemorySource::mapped_file(index_path));
});
Adding new index type
All available index types are defined in the pisa::detail::IndexType variant in index_types.hpp. This is what INDEX_TYPES macro used to be, basically. The rest of the machinery is in template code within pisa::detail namespace in the same file.
When adding a new index type (to avoid macros), one needs to edit pisa::detail::index_name function to define its name. Note, however, that because of the constexpr code, if an index is added to pisa::detail::IndexType and not to index_name, then it will fail to compile; so this won't lead to any bugs.
Note: tests are failing because I'm waiting for #457 to be merged.
I'll also work on documentation in code.
GCC 7 keeps crashing with an internal bug, even though the same version compiles it just fine locally. I couldn't yet figure out what's wrong. I'll probably look at it again in a few days. It might be some issue with libstdc++ rather than the compiler alone but I'm not sure.
RE. failing builds, I think this is a memory consumption issue. For reference, the peak memory usage for builds with a single thread on the evaluate_queries target (which I think is the heaviest target we have):
master= 3.6 GBthis PR= 5.4 GB
This is using gcc 7.3.0 on Ubuntu. So this PR does seem to be increasing the build memory substantially!
RE. failing builds, I think this is a memory consumption issue. For reference, the peak memory usage for builds with a single thread on the
evaluate_queriestarget (which I think is the heaviest target we have):
master= 3.6 GBthis PR= 5.4 GBThis is using gcc 7.3.0 on Ubuntu. So this PR does seem to be increasing the build memory substantially!
Thanks for looking into this. I'll look into it in a spare moment. If it can't be fixed, we'll need to look for other solutions, this is not an acceptable tradeoff.
@JMMackenzie I just ran the whole compilation from master on my laptop, and I get 5.4 GB peak memory. Can you try again and make sure you have current master and compile all tools and tests? This was Debug build.
For posterity, using time -v:
target evaluate_queries
MASTER
gcc = Maximum resident set size (kbytes): 3823100
clang = Maximum resident set size (kbytes): 4573132
LIST-OPTIONS
gcc = Maximum resident set size (kbytes): 5651552
clang = Maximum resident set size (kbytes): 7066608
Looks like tidy is running out of memory now based on the logs:
Error running '/usr/bin/clang-tidy': Child killed
make[2]: *** [tools/CMakeFiles/evaluate_queries.dir/build.make:63: tools/CMakeFiles/evaluate_queries.dir/evaluate_queries.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:5563: tools/CMakeFiles/evaluate_queries.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
Error: Process completed with exit code 137.
Right, the problem from this PR still persists, but fixing the other problem makes some tests pass.