pisa icon indicating copy to clipboard operation
pisa copied to clipboard

Listing options on cmdline

Open elshize opened this issue 5 years ago • 11 comments

I also got rid of the index types macro replacing it with some template metaprogramming.

elshize avatar Sep 28 '20 01:09 elshize

Codecov Report

Merging #439 (9969a8c) into master (a28100d) will increase coverage by 0.00%. The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #439   +/-   ##
=======================================
  Coverage   93.26%   93.26%           
=======================================
  Files          92       93    +1     
  Lines        4926     4941   +15     
=======================================
+ Hits         4594     4608   +14     
- Misses        332      333    +1     
Impacted Files Coverage Δ
include/pisa/mappable/mappable_vector.hpp 100.00% <ø> (ø)
include/pisa/index_types.hpp 100.00% <100.00%> (ø)
include/pisa/wand_data.hpp 85.55% <100.00%> (ø)
include/pisa/wand_data_raw.hpp 100.00% <100.00%> (ø)
include/pisa/bit_vector.hpp 97.54% <0.00%> (-0.41%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update a28100d...9969a8c. Read the comment docs.

codecov[bot] avatar Sep 28 '20 02:09 codecov[bot]

Overview

I refactor this to simplify the usage. There is a single public way of retrieving a "type" object for an index based on the encoding:

auto index_type = IndexType::resolve("pef"); // or IndexType::resolve_profiling for the indexes with profiling

This is an object that internally holds a variant of index type markers, meaning objects that have the type of the index encoded as a typedef. This is necessary, because there are some situations where we need to know the type before creating the index itself, in particular when compressing an index.

To run some code with the index type, we need to call:

index_type.execute([](auto&& index_type) {
    using Index = typename decltype(index_type)::type;
});

However, as most commonly we want to load an index and do something with it, there's another method that loads the index and runs some code on it:

index_type.load_and_execute(index_path, [](auto&& index) {
    // decltype(index) is something like `block_simdbp_index`
});

which is equivalent to:

index_type.execute([&index_path](auto&& index_type) {
    using Index = typename decltype(index_type)::type;
    auto index = Index(MemorySource::mapped_file(index_path));
});

Adding new index type

All available index types are defined in the pisa::detail::IndexType variant in index_types.hpp. This is what INDEX_TYPES macro used to be, basically. The rest of the machinery is in template code within pisa::detail namespace in the same file.

When adding a new index type (to avoid macros), one needs to edit pisa::detail::index_name function to define its name. Note, however, that because of the constexpr code, if an index is added to pisa::detail::IndexType and not to index_name, then it will fail to compile; so this won't lead to any bugs.

elshize avatar May 02 '21 21:05 elshize

Note: tests are failing because I'm waiting for #457 to be merged.

elshize avatar May 02 '21 21:05 elshize

I'll also work on documentation in code.

elshize avatar May 02 '21 21:05 elshize

GCC 7 keeps crashing with an internal bug, even though the same version compiles it just fine locally. I couldn't yet figure out what's wrong. I'll probably look at it again in a few days. It might be some issue with libstdc++ rather than the compiler alone but I'm not sure.

elshize avatar May 03 '21 20:05 elshize

RE. failing builds, I think this is a memory consumption issue. For reference, the peak memory usage for builds with a single thread on the evaluate_queries target (which I think is the heaviest target we have):

  • master = 3.6 GB
  • this PR = 5.4 GB

This is using gcc 7.3.0 on Ubuntu. So this PR does seem to be increasing the build memory substantially!

JMMackenzie avatar Jun 01 '21 01:06 JMMackenzie

RE. failing builds, I think this is a memory consumption issue. For reference, the peak memory usage for builds with a single thread on the evaluate_queries target (which I think is the heaviest target we have):

  • master = 3.6 GB
  • this PR = 5.4 GB

This is using gcc 7.3.0 on Ubuntu. So this PR does seem to be increasing the build memory substantially!

Thanks for looking into this. I'll look into it in a spare moment. If it can't be fixed, we'll need to look for other solutions, this is not an acceptable tradeoff.

elshize avatar Jun 01 '21 08:06 elshize

@JMMackenzie I just ran the whole compilation from master on my laptop, and I get 5.4 GB peak memory. Can you try again and make sure you have current master and compile all tools and tests? This was Debug build.

elshize avatar Jun 06 '21 13:06 elshize

For posterity, using time -v:

target evaluate_queries
MASTER
gcc   = Maximum resident set size (kbytes): 3823100
clang = Maximum resident set size (kbytes): 4573132
LIST-OPTIONS
gcc   = Maximum resident set size (kbytes): 5651552
clang = Maximum resident set size (kbytes): 7066608

JMMackenzie avatar Jun 07 '21 00:06 JMMackenzie

Looks like tidy is running out of memory now based on the logs:

Error running '/usr/bin/clang-tidy': Child killed
make[2]: *** [tools/CMakeFiles/evaluate_queries.dir/build.make:63: tools/CMakeFiles/evaluate_queries.dir/evaluate_queries.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:5563: tools/CMakeFiles/evaluate_queries.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
Error: Process completed with exit code 137.

JMMackenzie avatar Jun 11 '21 06:06 JMMackenzie

Right, the problem from this PR still persists, but fixing the other problem makes some tests pass.

elshize avatar Jun 12 '21 21:06 elshize