registry_object with Regex is extremely slow
Description of Problem:
Collection of a registry item such as this:
<registry_object xmlns="http://oval.mitre.org/XMLSchema/oval-definitions-5#windows" id="oval:org.mitre.oval:obj:239" version="2">
<hive>HKEY_LOCAL_MACHINE</hive>
<key operation="pattern match">^SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\Mozilla (\([0-1]\.[0-7]\)|\([0-1]\.[0-7]\.[0-6]\))$</key>
<name>DisplayName</name>
</registry_object>
Is extremely slow - taking 1min-2min to complete
I am not 100% sure why, I am guessing that the regex is being used in a way that isn't optimized, for example the code states:
HKEY hive = get_hive_from_str(hive_str);
registry_recurse_down(ctx, ei, hive_str, NULL, 0, UNLIMITED_DEPTH, key_operation, windows_view);
This is quite heavy if we start from SOFTWARE and trickle down, as there are "static" parts in the middle, for example a recurse from SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\ downwards is possible to improve speed
The issue with this isn't a single <registry_object> consuming 1min, is that having 210 such objects being collected - would take at least 3 or more hours to complete
OpenSCAP Version:
OpenSCAP 1.3.3
Operating System & Version:
Windows 10
Steps to Reproduce:
- Use a minimal test case that collects one object:
<registry_object xmlns="http://oval.mitre.org/XMLSchema/oval-definitions-5#windows" id="oval:org.mitre.oval:obj:239" version="2">
<hive>HKEY_LOCAL_MACHINE</hive>
<key operation="pattern match">^SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\Mozilla (\([0-1]\.[0-7]\)|\([0-1]\.[0-7]\.[0-6]\))$</key>
<name>DisplayName</name>
</registry_object>
- Use
oscap oval collectcommand to collect it - Monitor the time it takes to collect it - 1min or more
Actual Results:
1min or more time per <registry_object> that utilizes pattern match is observed
Expected Results:
1s or less time per <registry_object> that does not utilizes pattern match is observed, pattern match should be as closest as possible to 'fixed' registry reading
Additional Information / Debugging Steps:
Would you be interested in a proposed code that I have built to resolve it?
I do not know how to integrate it into the probe though, so that part would have to be by you
Let me know if you want that effort and I will post it here
Yeah, sure, go ahead. Even patch ideas are welcome :)
Would you be interested in a proposed
codethat I have built to resolve it?I do not know how to integrate it into the probe though, so that part would have to be by you
Let me know if you want that effort and I will post it here
Hey @nrathaus can you please tell me how the probe code is reading the values from the windows registry and checking them with the STIG file and passing or failing the result. Like for example, HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa\EveryoneIncludesAnonymous in this hive, the value of "everyoneincludesanonymous" is 0, which is correct for the rule to be passed but unfortunately, it's failing. So I just wanted to know where the probe code is reading this value and comparing it with the standard value. @evgenyz if you can help, please. Thank you.
Wrong Evgeni ;)
@evgeni Man, I'm sorry. I have no idea why this is happening :)
@me-shubh The registry traversal is performed in the registry_recurse_down function (src/OVAL/probes/windows/registry_probe.c).
@evgeni Man, I'm sorry. I have no idea why this is happening :)
Our usernames start identically, and we both contributed to this repo, so GH suggests of us in the completion. 🤷♀️
@me-shubh The registry traversal is performed in the
registry_recurse_downfunction (src/OVAL/probes/windows/registry_probe.c).
That is fine. But where are the values from the windows registry and the one from the stig or XML file are getting compared and hence printing pass or fail while evaluating an OVAL file. Like where exactly the comparison and decision part is happening.
That is fine. But where are the values from the windows registry and the one from the stig or XML file are getting compared and hence printing pass or fail while evaluating an OVAL file. Like where exactly the comparison and decision part is happening.
There is no exact place where is it happening, because the code path depends on what is in OVAL object and state. But, the most important thing, values are being compared outside of the probe code (probes are only responsible for collecting entities from the system).
If you want to know what entities were collected by a probe, then --verbose=DEVEL argument to the scanner could help you.
Hi,
My proposed method, where sKeyOperation is the operator="...":
if (0 == sKeyOperation.compare("pattern match") &&
std::string::npos != sKey.find("^")) {
I limit my optimization to the case where it has a ^ at the beginning, otherwise its impossible for my code to work properly
I then call a recursive function called crawlRegistryKeyRegex and return the paths as a vector back:
std::vector <std::string> sRelevantKeyPaths;
sRelevantKeyPaths = crawlRegistryKeyRegex(hKey, "", sKey);
The function is defined as follows:
std::vector<std::string> crawlRegistryKeyRegex(HKEY hRootKey, std::string sBaseKey, std::string sKey)
And does runs as such:
std::vector<std::string> crawlRegistryKeyRegex(HKEY hRootKey, std::string sBaseKey, std::string sKey)
{
std::vector<std::string> sPossibleKeys;
const char delim[] = "\\\\";
std::string sKeyLessBase = sKey.substr(sBaseKey.length());
if (0 == sKeyLessBase.find("\\\\")) {
sKeyLessBase = sKeyLessBase.substr(strlen("\\\\"));
}
std::vector <std::string> elements = split(sKeyLessBase, delim);
// We need to find the first element has non-static values, i.e. has * . or [
std::string staticKey = sBaseKey;
auto it = elements.begin(); // We want to keep track of the position were we ended
for (it = elements.begin();
it != elements.end();
it++) {
std::string element = (*it);
size_t posOfInterest = 0;
bool bStatic = true;
while (std::string::npos != (posOfInterest = element.find_first_of("*.[]$^()", posOfInterest))) {
/* Make sure that the previous char is not a \ */
bool bEscaped = false;
if (posOfInterest > 0) {
if (element[posOfInterest - 1] == '\\') {
bEscaped = true;
/* If it is a \\ make sure that the one before it is also not a \ */
if (posOfInterest > 1) {
if (element[posOfInterest - 2] == '\\') {
bEscaped = false;
}
}
}
}
if (bEscaped) {
// not relevant its static
posOfInterest++;
continue;
}
else {
// its non static
bStatic = false;
break;
}
}
// it is static
if (bStatic) {
if (!staticKey.empty())
staticKey.append("\\\\");
staticKey.append(element);
}
else {
break;
}
}
std::string reminderKey = "";
if (it != elements.end()) {
for (auto itreminder = it;
itreminder != elements.end();
itreminder++) {
std::string element = (*itreminder);
if (!reminderKey.empty())
reminderKey.append("\\\\");
reminderKey.append(element);
}
}
DWORD dwError = 0;
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
std::wstring wide_static_key = converter.from_bytes(staticKey);
// Lets open the base key location
HKEY hkResult = NULL;
if (ERROR_SUCCESS != (dwError = RegOpenKeyEx(hRootKey, wide_static_key.c_str(), 0, KEY_READ | KEY_WOW64_64KEY, &hkResult))) {
// std::cerr << "RegOpenKey for " << staticKey << " failed error code: " << dwError << " @ " << __LINE__ << std::endl;
return sPossibleKeys;
}
#define MAX_KEY_LENGTH 255
#define MAX_VALUE_NAME 16383
TCHAR achKey[MAX_KEY_LENGTH] = { 0 }; // buffer for subkey name
DWORD cbName = 0; // size of name string
TCHAR achClass[MAX_PATH] = TEXT(""); // buffer for class name
DWORD cchClassName = MAX_PATH; // size of class string
DWORD cSubKeys = 0; // number of subkeys
DWORD cbMaxSubKey = 0; // longest subkey size
DWORD cchMaxClass = 0; // longest class string
DWORD cValues = 0; // number of values for key
DWORD cchMaxValue = 0; // longest value name
DWORD cbMaxValueData = 0; // longest value data
DWORD cbSecurityDescriptor = 0; // size of security descriptor
FILETIME ftLastWriteTime; // last write time
LSTATUS retCode = RegQueryInfoKey(
hkResult, // key handle
achClass, // buffer for class name
&cchClassName, // size of class string
NULL, // reserved
&cSubKeys, // number of subkeys
&cbMaxSubKey, // longest subkey size
&cchMaxClass, // longest class string
&cValues, // number of values for this key
&cchMaxValue, // longest value name
&cbMaxValueData, // longest value data
&cbSecurityDescriptor, // security descriptor
&ftLastWriteTime); // last write time
if (cSubKeys)
{
//std::cerr << "Number of subkeys: " << cSubKeys << std::endl;
std::string pattern;
size_t doubleSlash = std::string::npos;
if (std::string::npos != (doubleSlash = reminderKey.find("\\\\"))) {
// We need to take just the part that we are interested in
pattern = reminderKey.substr(0, doubleSlash);
reminderKey = reminderKey.substr(doubleSlash + strlen("\\\\"));
}
else {
pattern = reminderKey;
if (pattern.length() >= strlen(".*$") && 0 == pattern.substr(pattern.length() - strlen(".*$")).compare(".*$")) { // If it ends with a .*$ we need to return all the keys after us until "eof keys"
reminderKey = ".*$";
}
else {
reminderKey = "";
}
}
boost::match_flag_type flags_version = boost::match_default;
boost::match_results<std::string::const_iterator> results_version;
boost::regex key_regex(pattern, boost::regex::icase);
for (DWORD i = 0; i < cSubKeys; i++) {
cbName = MAX_KEY_LENGTH;
retCode = RegEnumKeyEx(hkResult, i,
achKey, // hold the key name
&cbName,
NULL,
NULL,
NULL,
&ftLastWriteTime);
if (ERROR_SUCCESS == retCode) {
std::wstring swachKey = achKey;
std::string sachKey = converter.to_bytes(swachKey);
// std::cerr << "(" << i + 1 << ") " << sachKey.c_str() << std::endl;
std::string sSubKey = converter.to_bytes(achKey);
if (boost::regex_search(sachKey, results_version, key_regex, flags_version)) {
std::string sRelevantKey = staticKey;
if (!sRelevantKey.empty())
sRelevantKey.append("\\\\");
sRelevantKey.append(sachKey);
sPossibleKeys.insert(sPossibleKeys.end(), sRelevantKey);
}
}
}
}
RegCloseKey(hkResult);
if (0 == cSubKeys && 0 == reminderKey.compare(".*$")) { // We reached the end, no sub keys and our reminder is a catch all
return sPossibleKeys;
}
if (!reminderKey.empty()) {
// The reminder has something, we need to dive into it
std::vector<std::string> sSubPossibleKeys;
for (auto it = sPossibleKeys.begin();
it != sPossibleKeys.end();
it++) {
std::string sSubKey = (*it);
std::string sReminderKey = sSubKey;
/*
sReminderKey = ReplaceAll(sReminderKey, ".", "\\."); // Escape the regexs
sReminderKey = ReplaceAll(sReminderKey, "$", "\\$"); // Escape the regexs
sReminderKey = ReplaceAll(sReminderKey, "[", "\\["); // Escape the regexs
sReminderKey = ReplaceAll(sReminderKey, "]", "\\]"); // Escape the regexs
sReminderKey = ReplaceAll(sReminderKey, "?", "\\?"); // Escape the regexs
sReminderKey = ReplaceAll(sReminderKey, "{", "\\{"); // Escape the regexs
sReminderKey = ReplaceAll(sReminderKey, "}", "\\}"); // Escape the regexs
sReminderKey = ReplaceAll(sReminderKey, "(", "\\("); // Escape the regexs
sReminderKey = ReplaceAll(sReminderKey, ")", "\\)"); // Escape the regexs
sReminderKey = ReplaceAll(sReminderKey, "*", "\\*"); // Escape the regexs
*/
if (!sReminderKey.empty())
sReminderKey += "\\\\";
sReminderKey += reminderKey;
std::vector <std::string> sMatchesFound = crawlRegistryKeyRegex(hRootKey, sSubKey, sReminderKey);
sSubPossibleKeys.insert(sSubPossibleKeys.end(), sMatchesFound.begin(), sMatchesFound.end());
}
if (sSubPossibleKeys.empty() && 0 == reminderKey.compare(".*$")) {
// If no match was found and the reminder is .*$ it measn we reached the bottom, return the keys we previously had
return sPossibleKeys;
}
sPossibleKeys = sSubPossibleKeys;
}
return sPossibleKeys;
}
I have similar code that takes these paths found and then does a regex (if needed) on the name part
Let me know if you want to integrate it :) I can help review it if needed
NOTE: There is the danger here of (very deep) recursion as someone can put "^.*" as a regex
I didn't handle this because I am using trusted sources for the OVAL consumption
Or huge vector of results because of a catch all regex