pisa
pisa copied to clipboard
Validate input and output paths for programs
We should validate that a given collection/index given as input/output is valid, e.g., the directory exists and the files are there. If not, fail with a message.
The reason is that sometimes you can get an error that doesn't say much or even not get an error at all but just some undefined behavior.
Just taking an initial look at this. There's a c++17 thing we could maybe use: https://en.cppreference.com/w/cpp/filesystem/exists
How general should this be though? I figure we want to call this function in each binary that requires a raw collection before we try to use the collection? Would something like this be sufficient (say, in util/util.hpp)?
inline bool collection_exists(const std::string basename) {
return (std::experimental::filesystem::exists(basename + ".docs") &
std::experimental::filesystem::exists(basename + ".freqs") &
std::experimental::filesystem::exists(basename + ".sizes"));
}
Yes, so we should have something of this sort. We should be able to do several things:
- Verify that a certain dir exists or not,
- Verify that a binary collection (or any other structure) exists (what you showed above),
- and so on.
We should not use std::experimental::filesystem though. The support across different compilers is spotty at best. Instead, we should use boost::filesystem -- std version is actually based on that, so interfaces are similar (though there are some minor differences).
In fact, we would ideally do the following (example for binary collection):
- Have a function (maybe even constructor?) constructing a binary collection from disk.
- This function would return something like
std::expected(http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0323r7.html). Currently, this is only a proposal, so not in the standard (not even sure if going to c++20). But there's plenty of libs implementing this; I have once used this one: https://github.com/martinmoene/expected-lite.
This could be also implemented as a simple variant (not as nice).
The idea is that you return a sum type that either has a certain type, such as, binary_collection, or a certain error, e.g., string or a custom error object.
But that's just a side note about design to discuss.