klee
klee copied to clipboard
Add KDAlloc, a deterministic allocator for dynamic symbolic execution
Summary:
Given the following program:
#include <stdlib.h>
#include <stdio.h>
#include <klee/klee.h>
int main(int argc, char** argv) {
int x = klee_int("x");
if(x) {
printf("%p\n", malloc(1));
} else {
printf("%p\n", malloc(1));
}
}
KLEE's current deterministic allocator yields something like the following:
$ klee --allocate-determ test.bc
...
0x7ff30001478
0x7ff30001460
KLEE: Deterministic memory allocation starting from 0x7ff30000000
...
This is because the two paths allocate (deterministically) from the same pool of addresses. The same happens with the default memory allocator. Therefore, memory is not allocated deterministically from the point of view of the program under test.
This PR implements the memory allocator described in our upcoming ECOOP 2022 publication "A Deterministic Memory Allocator for Dynamic Symbolic Execution". It mainly differs from the artifact for that publication in that it is cleaned up for integration in mainline KLEE. We also ensured that kernels with transparent huge page support set to "always" will work out of the box.
Running the same example with KDAlloc as the deterministic allocator:
$ ../build-host/bin/klee --allocate-determ test.bc
...
0x7e88152e7000
0x7e88152e7000
KLEE: Deterministic allocator: Using quarantine queue size 8
KLEE: Deterministic allocator: globals (start-address=0x7f62952e7000 size=10 GiB)
KLEE: Deterministic allocator: constants (start-address=0x7f60152e7000 size=10 GiB)
KLEE: Deterministic allocator: heap (start-address=0x7e60152e7000 size=1024 GiB)
KLEE: Deterministic allocator: stack (start-address=0x7e40152e7000 size=128 GiB)
...
Additionally, KDAlloc provides controlled address reuse ("quarantine queue size 8" in the example), and is much more likely to catch out of bounds accesses (allocations are distributed over the large virtual address spaces, e.g., 1024 GiB for the heap in the example).
Checklist:
- [x] The PR addresses a single issue. If it can be divided into multiple independent PRs, please do so.
- [x] The PR is divided into a logical sequence of commits OR a single commit is sufficient.
- [x] There are no unnecessary commits (e.g. commits fixing issues in a previous commit in the same PR).
- [x] Each commit has a meaningful message documenting what it does.
- [x] All messages added to the codebase, all comments, as well as commit messages are spellchecked.
- [x] The code is commented OR not applicable/necessary.
- [x] The patch is formatted via clang-format OR not applicable (if explicitly overridden leave unchecked and explain).
- [x] There are test cases for the code you added or modified OR no such test cases are required.
Codecov Report
Merging #1511 (5318b53) into master (6bc91e5) will increase coverage by
0.05%. The diff coverage is73.74%.
:exclamation: Current head 5318b53 differs from pull request most recent head 6bd17ca. Consider uploading reports for the commit 6bd17ca to get more accurate results
@@ Coverage Diff @@
## master #1511 +/- ##
==========================================
+ Coverage 71.39% 71.44% +0.05%
==========================================
Files 156 164 +8
Lines 18560 19524 +964
Branches 4115 4398 +283
==========================================
+ Hits 13250 13949 +699
- Misses 3488 3571 +83
- Partials 1822 2004 +182
| Impacted Files | Coverage Δ | |
|---|---|---|
| lib/Core/AddressSpace.h | 100.00% <ø> (ø) |
|
| lib/Core/ExecutionState.h | 80.76% <ø> (ø) |
|
| lib/Core/Executor.h | 77.27% <ø> (ø) |
|
| lib/Core/MemoryManager.h | 100.00% <ø> (ø) |
|
| include/klee/KDAlloc/suballocators/cow_ptr.h | 51.78% <51.78%> (ø) |
|
| include/klee/KDAlloc/mapping.h | 57.14% <57.14%> (ø) |
|
| include/klee/KDAlloc/suballocators/loh.h | 64.70% <64.70%> (ø) |
|
| include/klee/KDAlloc/suballocators/sized_regions.h | 72.17% <72.17%> (ø) |
|
| include/klee/ADT/Bits.h | 59.09% <75.00%> (-1.44%) |
:arrow_down: |
| ...nclude/klee/KDAlloc/suballocators/slot_allocator.h | 76.59% <76.59%> (ø) |
|
| ... and 9 more |
@danielschemmel what is the status of this PR?
@ccadar The holdup is due to us deriving from llvm::cl::parser<std::uint32_t> a.k.a. llvm::cl::parser<unsigned>. Before LLVM 9, this class was final (LLVM 8), but starting with LLVM 9 it is not.
@MartinNowack suggested that we wait for #1518 to drop instead of adding more workarounds for LLVM < 9.
The needed workaround should not be horrible, as we should be able to derive from basic_parser ourselves.
What would you prefer?
As the CI aborts all pending runs once any part of it fails, it might be interesting to know that I can reproduce the LLVM 8 failure locally, while LLVM 9 builds as expected.
@danielschemmel I agree we shouldn't support LLVM < 9, as we'll retire it soon. So let me first ask @251: what's the status of #1518 , do you plan to submit it soon? Otherwise, I would add KDAlloc with some #ifdefs so that it's only enabled for LLVM >= 9.
@ccadar Besides the path with LLVM 6 it's done from my side but @MartinNowack wanted to wait with it. Not sure what's supposed to go in/has to change before.
@danielschemmel I think I have been able to trace down why the two KDAlloc unit tests fail for MSan: The mapping has a size of 16 TiB and in the MSan version (I locally built the docker container that is also used in the CI), the following happens (output from strace):
write(1, "Using kdalloc\n", 14Using kdalloc
) = 14
mmap(NULL, 17592186044416, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = -1 ENOMEM (Cannot allocate memory)
My guess is that the additional mappings made by MSan for its shadow memory let our subsequent mmap fail. As we only call abort() in case mmap fails, there is no other output after "Using kdalloc". If I reduce the mapping size to 8 TiB, the tests succeed.
@jbuening nice find. Is this the mapping that KDAlloc creates, or MSan?
@ccadar yes, the mapping with ENOMEM is KDAlloc's, but I guess it is just a matter of order (i.e. the mappings by MSan are just created earlier)
Thanks, @jbuening that was exactly the hint that I was missing. All the CI failures are now gone, @ccadar!
@danielschemmel @jbuening great! As we discussed live, it would be useful to have a run where KDAlloc is enabled by default and understand any test failures.
@ccadar done. I checked the four failing tests from Linux (Z3 only) and all of them are handled correctly. For three cases, KDAlloc correctly prints a use-after-free error, while the test expects an out-of-bounds error and for the fourth one, KDAlloc correctly prints a double-free error, while the test expects an out-of-bounds error.
Thus, I conclude that all test failures are only because the tests did not account for KDAlloc being so great ;).
Thanks, @danielschemmel. We can remove that commit for now.
I see KDAlloc is implemented via a single giant commit. Is there an opportunity to split it a bit, to make reviewing easier?
The commit is gone again.
I think that splitting up kdalloc into multiple commits is not very useful, as it would just mirror the file structure in this commit, except without the possibility to see the big picture. I could imagine splitting this into "adding kdalloc" and "integrating kdalloc", if that helps?
@ccadar I have broken up KDAlloc into 6 commits
Thanks, @danielschemmel. @jbuening @251 @MartinNowack can you have a first look?
I changed some minor things when going over the code to silence my IDE:
diff --git a/include/kdalloc/mapping.h b/include/kdalloc/mapping.h
index 05afe203..eee95fd8 100644
--- a/include/kdalloc/mapping.h
+++ b/include/kdalloc/mapping.h
@@ -85,7 +85,7 @@ class Mapping {
public:
Mapping() = default;
- Mapping(std::size_t size) : Mapping(0, size) {}
+ explicit Mapping(std::size_t size) : Mapping(0, size) {}
Mapping(std::uintptr_t baseAddress, std::size_t size) : size(size) {
try_map(baseAddress);
@@ -98,11 +98,11 @@ public:
Mapping(Mapping const &) = delete;
Mapping &operator=(Mapping const &) = delete;
- Mapping(Mapping &&other) : baseAddress(other.baseAddress), size(other.size) {
+ Mapping(Mapping &&other) noexcept : baseAddress(other.baseAddress), size(other.size) {
other.baseAddress = MAP_FAILED;
other.size = 0;
}
- Mapping &operator=(Mapping &&other) {
+ Mapping &operator=(Mapping &&other) noexcept {
if (&other != this) {
using std::swap;
swap(other.baseAddress, baseAddress);
diff --git a/include/kdalloc/suballocators/loh.h b/include/kdalloc/suballocators/loh.h
index b61b381c..aad4218d 100644
--- a/include/kdalloc/suballocators/loh.h
+++ b/include/kdalloc/suballocators/loh.h
@@ -139,7 +139,7 @@ private:
assert(!control.unlimitedQuarantine);
if (control.quarantineSize == 0) {
- return std::pair<void *, std::size_t>(ptr, size);
+ return {ptr, size};
}
assert(data->referenceCount == 1 &&
@@ -153,11 +153,11 @@ private:
}
auto quarantinedPtr = std::exchange(
- reinterpret_cast<void *&>(data->quarantine[pos]), std::move(ptr));
+ reinterpret_cast<void *&>(data->quarantine[pos]), ptr);
auto quarantinedSize = std::exchange(
reinterpret_cast<std::size_t &>(data->quarantine[pos + 1]),
- std::move(size));
- return std::pair<void *, std::size_t>(quarantinedPtr, quarantinedSize);
+ size);
+ return {quarantinedPtr, quarantinedSize};
}
public:
diff --git a/include/kdalloc/suballocators/slot_allocator.h b/include/kdalloc/suballocators/slot_allocator.h
index 0a50afc4..47576fac 100644
--- a/include/kdalloc/suballocators/slot_allocator.h
+++ b/include/kdalloc/suballocators/slot_allocator.h
@@ -218,7 +218,7 @@ private:
data->quarantineAndBitmap[0] = pos + 1;
}
- return std::exchange(data->quarantineAndBitmap[pos], std::move(index));
+ return std::exchange(data->quarantineAndBitmap[pos], index);
}
inline static constexpr std::ptrdiff_t
diff --git a/include/kdalloc/util/cow_ptr.h b/include/kdalloc/util/cow_ptr.h
index a30eb96b..e4e0f332 100644
--- a/include/kdalloc/util/cow_ptr.h
+++ b/include/kdalloc/util/cow_ptr.h
@@ -47,7 +47,7 @@ public:
CoWPtr(CoWPtr &&other) noexcept : ptr(std::exchange(other.ptr, nullptr)) {}
- CoWPtr &operator=(CoWPtr &&other) {
+ CoWPtr &operator=(CoWPtr &&other) noexcept {
if (this != &other) {
release();
ptr = std::exchange(other.ptr, nullptr);
diff --git a/include/kdalloc/util/sized_regions.h b/include/kdalloc/util/sized_regions.h
index ed12289f..94a43732 100644
--- a/include/kdalloc/util/sized_regions.h
+++ b/include/kdalloc/util/sized_regions.h
@@ -357,7 +357,7 @@ public:
}
}
- CoWPtr<Node> extractRegion(char *const baseAddress) {
+ CoWPtr<Node> extractRegion(char const*const baseAddress) {
CoWPtr<Node> *target = &root;
for (;;) {
assert(*target && "cannot extract region that is not part of the treap");
diff --git a/include/kdalloc/util/tagged_logger.h b/include/kdalloc/util/tagged_logger.h
index 33d3be2a..656ed3ff 100644
--- a/include/kdalloc/util/tagged_logger.h
+++ b/include/kdalloc/util/tagged_logger.h
@@ -27,7 +27,6 @@ protected:
#if KDALLOC_TRACE >= 1
traceLineImpl(static_cast<C const *>(this)->logTag(std::cout),
std::forward<T>(args));
-#else
#endif
}
};
diff --git a/lib/Core/MemoryManager.cpp b/lib/Core/MemoryManager.cpp
index a0784e4d..7658eefd 100644
--- a/lib/Core/MemoryManager.cpp
+++ b/lib/Core/MemoryManager.cpp
@@ -107,7 +107,7 @@ llvm::cl::opt<std::uintptr_t> DeterministicAllocationStackStartAddress(
llvm::cl::cat(MemoryCat));
struct QuarantineSizeParser : public llvm::cl::parser<std::uint32_t> {
- QuarantineSizeParser(llvm::cl::Option &O)
+ explicit QuarantineSizeParser(llvm::cl::Option &O)
: llvm::cl::parser<std::uint32_t>(O) { }
bool parse(llvm::cl::Option &O, llvm::StringRef ArgName,
@@ -121,7 +121,7 @@ struct QuarantineSizeParser : public llvm::cl::parser<std::uint32_t> {
return false;
}
};
-static llvm::cl::opt<std::uint32_t, true, QuarantineSizeParser>
+llvm::cl::opt<std::uint32_t, true, QuarantineSizeParser>
DeterministicAllocationQuarantineSize(
"allocate-determ-quarantine",
llvm::cl::desc("Size of quarantine queues in allocator (default=8, disabled=0, unlimited=-1)"),
@@ -186,7 +186,7 @@ MemoryManager::MemoryManager(ArrayCache *_arrayCache)
// check invariants
#if LLVM_VERSION_CODE >= LLVM_VERSION(10, 0)
llvm::Align pageAlignment(pageSize);
- #endif
+#endif
for (auto &requestedSegment : requestedSegments) {
auto& segment1 = std::get<0>(requestedSegment);
auto& start1 = std::get<1>(requestedSegment);
@@ -199,13 +199,13 @@ MemoryManager::MemoryManager(ArrayCache *_arrayCache)
"is not page aligned (page size: %zu B)",
segment1.c_str(), pageAlignment.value());
}
- #else
+#else
if (start1 != 0 && llvm::OffsetToAlignment(start1, pageSize) != 0) {
klee_error("Deterministic allocator: Requested start address for %s "
"is not page aligned (page size: %zu B)",
segment1.c_str(), pageSize);
}
- #endif
+#endif
// check for overlap of segments
std::uintptr_t end1 = start1 + size1;
@@ -367,8 +367,8 @@ bool MemoryManager::markMappingsAsUnneeded() {
if (!DeterministicAllocation)
return false;
- if (!DeterministicAllocationMarkAsUnneeded)
- return false;
+ if (!DeterministicAllocationMarkAsUnneeded)
+ return false;
globalsFactory.getMapping().clear();
heapFactory.getMapping().clear();
Besides that:
- files have no license
- there are few typos (preceed, restort)
- Do we need
sized_regions.h:setBaseAddress()? - Do we really need
xoshiro.has rng? MemoryManager::getUsedDeterministicSize()is not implemented, I guess it's non-trivial?
Otherwise the code looks good to me but obviously I did not try to understand all details.
@251
- I have applied your diff
- the files now have license headers (this does not make them change licenses)
- "preceeding" became "preceding" and "restort" became "restore"
kdalloc::util::Node::setBaseAddressis not currently used by kdalloc. It just makes the type more complete. If it bothers anyone, I am happy to remove it; there are probably more functions that are not currently being executed.- The use of the xoshiro RNG is to facility performance benchmarking KDAlloc. We need a very fast RNG with a minimal memory footprint to ensure that KDAlloc does not get dominated by the RNG itself.
- I think the point of this function is for the
MallocUsagestatistic. Following the name of that statistic,0is the correct result - we do not have any additional malloc usage that is not already reported byGetTotalMallocUsage. We could do something along the lines of the number of currently paged pages multiplied by the page size? That number is - kind of - available, but does not actually match whatGetTotalMallocUsagereports ("because it does not include TCMalloc overhead or memory fragmentation"), so we would really have to change the name and computation of that metric for all cases.
Hi @danielschemmel, thanks for this excellent contribution! I have a few small comments, mostly in terms of improving the code documentation, but I think this is ready to be merged now!
I will wait until tomorrow in case others see any major issues.
@danielschemmel @MartinNowack let's try to resolve this soon.
BTW, I would rename the option(s) to --kdalloc. It matches the paper and it avoids confusion with the previous trivial version of deterministic allocation.
@ccadar On the other hand it adds confusion for users who don't read scientific papers.
Maybe, although the name is quite descriptive, and --help will provide more info. More importantly, I don't expect most users to use this flag: once we are happy with --kdalloc, we will switch it on by default.
@MartinNowack I believe @danielschemmel has addressed your feedback? If not, what's missing?
@danielschemmel do you have any preference between --kdalloc and --allocate-determ? As I said, I prefer --kdalloc as it's descriptive, eliminates confusion with the old option, and is in sync with the paper, but I won't push for it if the majority is against it.
@ccadar Well, no they have not been all addressed.
Here are my thoughts on it, this might be a more general thing on how to handle such cases - less specific to this PR.
The major "issue" I have with this PR is the unclear relation of it to the rest of KLEE. Is it part of KLEE or is it a separate project? Currently, it's neither. And this is expressed in the overall structure of this PR and its code size.
In its current state, KDalloc introduces 3.6k lines of code:
- While, it uses KLEE's testing infrastructure for its functional tests
- It adds an extra namespace next to KLEE and not being part of the
namespace klee - is replicating some features that exist in KLEE, i.e. tracing v.s.
klee_debug - adds yet another smart-pointer implementation (
CoWptrs) - adds yet another
/utildirectory with data structures that are used by KDAlloc but are not specific to this project
If it is meant to be a separate project, i.e. like metaSMT, sqlite, ... - then it might be better to maintain it as a separate project and KLEE is using it through its APIs and tests should be along this line. For example, it could also live under the https://github.com/klee umbrella. This would make this PR substantially smaller. But, I assume it should become part of KLEE instead 😉
In the later case, I would rather like to see it integrated it more tightly using existing features and directory structures. I'm aware that existing implementations (i.e. shared ptr implementation or KLEE's debug facility) do not necessarily offer all the features required but I would rather like to use and extend the existing ones instead of adding yet-another implementation of something similar. Equally, having one place, where to find things not multiple is useful.
This helps all developers and even maintainers along the lines and it would also allow us to use the new features in other parts of KLEE.
Of course, there is still the third option, to keep the PR like this with all its implications. In that case, I'm fine with the changes. 😄
Any thoughts on this?
(And just to repeat, I'm also fine with keeping this PR as it is. In that case, the PR is in a good shape.)
@MartinNowack I understand the concerns, but I would nevertheless merge it. It is a nice addition to KLEE, and it would be pity to lose it. Note that the PR is already 9 months old; I doubt we will ever merge it if we don't do it now.
Also, the PR is independent of the core of KLEE, although of course it interacts tightly with the core code in places. Nevertheless, it can be turned off (and it's turned off by default). I think it's ok to be a bit more lenient with such features. And of course, I think the code is overall in great shape, and some of the issues you raised could be addressed with some later refactorings.
So should we merge it?
@danielschemmel do you have any preference between
--kdallocand--allocate-determ? As I said, I prefer--kdallocas it's descriptive, eliminates confusion with the old option, and is in sync with the paper, but I won't push for it if the majority is against it.
If the allocate-determ flag were more cleanly named, I would have a clean favorite, as it is, I don't really care. I will have it renamed to kdalloc ASAP
* It adds an extra namespace next to KLEE and not being part of the `namespace klee`
I like different namespaces for things that can be separated. How do you feel about klee::kdalloc?
* is replicating some features that exist in KLEE, i.e. tracing v.s. `klee_debug`
Can klee_debug be used in a non-KLEE build (without a functional rewrite)? Debugging this is already a pain if you only have to deal with kdalloc...
* adds yet another smart-pointer implementation (`CoW` ptrs)
The CoW pointers behave quite different than normal smart pointers. You can see the difference in CoW::get (which does not have a non-const variant) and CoW::getOwned (which allows mutable access to the data but requires singular ownership).
* adds yet another `/util` directory with data structures that are used by KDAlloc but are not specific to this project
That directory name is a bit of a historical artifact... I can change the structure a bit to make it disappear ;)
I like different namespaces for things that can be separated. How do you feel about
klee::kdalloc?
On my side, I like this idea.
- All cli arguments are now
--kdalloc kdalloc/utilis gone- this caused some changes to
include/klee/ADT/Bits.h
- this caused some changes to
::kdallocbecame::klee::kdallocinclude/kdallocbecameinclude/klee/KDAlloc
- reformatted every change with clang-format
Nice changes, thanks, @danielschemmel . The CI failure is independent of your changes, we need to merge #1576 first.