openscap icon indicating copy to clipboard operation
openscap copied to clipboard

registry_object with Regex is extremely slow

Open nrathaus opened this issue 5 years ago • 11 comments

Description of Problem:

Collection of a registry item such as this:

<registry_object xmlns="http://oval.mitre.org/XMLSchema/oval-definitions-5#windows" id="oval:org.mitre.oval:obj:239" version="2">
  <hive>HKEY_LOCAL_MACHINE</hive>
  <key operation="pattern match">^SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\Mozilla (\([0-1]\.[0-7]\)|\([0-1]\.[0-7]\.[0-6]\))$</key>
  <name>DisplayName</name>
</registry_object>

Is extremely slow - taking 1min-2min to complete

I am not 100% sure why, I am guessing that the regex is being used in a way that isn't optimized, for example the code states:

HKEY hive = get_hive_from_str(hive_str);
registry_recurse_down(ctx, ei, hive_str, NULL, 0, UNLIMITED_DEPTH, key_operation, windows_view);

This is quite heavy if we start from SOFTWARE and trickle down, as there are "static" parts in the middle, for example a recurse from SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\ downwards is possible to improve speed

The issue with this isn't a single <registry_object> consuming 1min, is that having 210 such objects being collected - would take at least 3 or more hours to complete

OpenSCAP Version:

OpenSCAP 1.3.3

Operating System & Version:

Windows 10

Steps to Reproduce:

  1. Use a minimal test case that collects one object:
<registry_object xmlns="http://oval.mitre.org/XMLSchema/oval-definitions-5#windows" id="oval:org.mitre.oval:obj:239" version="2">
  <hive>HKEY_LOCAL_MACHINE</hive>
  <key operation="pattern match">^SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\Mozilla (\([0-1]\.[0-7]\)|\([0-1]\.[0-7]\.[0-6]\))$</key>
  <name>DisplayName</name>
</registry_object>
  1. Use oscap oval collect command to collect it
  2. Monitor the time it takes to collect it - 1min or more

Actual Results:

1min or more time per <registry_object> that utilizes pattern match is observed

Expected Results:

1s or less time per <registry_object> that does not utilizes pattern match is observed, pattern match should be as closest as possible to 'fixed' registry reading

Additional Information / Debugging Steps:

nrathaus avatar May 19 '20 10:05 nrathaus

Would you be interested in a proposed code that I have built to resolve it?

I do not know how to integrate it into the probe though, so that part would have to be by you

Let me know if you want that effort and I will post it here

nrathaus avatar May 24 '20 07:05 nrathaus

Yeah, sure, go ahead. Even patch ideas are welcome :)

evgenyz avatar May 25 '20 08:05 evgenyz

Would you be interested in a proposed code that I have built to resolve it?

I do not know how to integrate it into the probe though, so that part would have to be by you

Let me know if you want that effort and I will post it here

Hey @nrathaus can you please tell me how the probe code is reading the values from the windows registry and checking them with the STIG file and passing or failing the result. Like for example, HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa\EveryoneIncludesAnonymous in this hive, the value of "everyoneincludesanonymous" is 0, which is correct for the rule to be passed but unfortunately, it's failing. So I just wanted to know where the probe code is reading this value and comparing it with the standard value. @evgenyz if you can help, please. Thank you.

me-shubh avatar May 28 '20 02:05 me-shubh

Wrong Evgeni ;)

evgeni avatar May 28 '20 03:05 evgeni

@evgeni Man, I'm sorry. I have no idea why this is happening :)

evgenyz avatar May 28 '20 09:05 evgenyz

@me-shubh The registry traversal is performed in the registry_recurse_down function (src/OVAL/probes/windows/registry_probe.c).

evgenyz avatar May 28 '20 09:05 evgenyz

@evgeni Man, I'm sorry. I have no idea why this is happening :)

Our usernames start identically, and we both contributed to this repo, so GH suggests of us in the completion. 🤷‍♀️

evgeni avatar May 28 '20 09:05 evgeni

@me-shubh The registry traversal is performed in the registry_recurse_down function (src/OVAL/probes/windows/registry_probe.c).

That is fine. But where are the values from the windows registry and the one from the stig or XML file are getting compared and hence printing pass or fail while evaluating an OVAL file. Like where exactly the comparison and decision part is happening.

me-shubh avatar May 28 '20 09:05 me-shubh

That is fine. But where are the values from the windows registry and the one from the stig or XML file are getting compared and hence printing pass or fail while evaluating an OVAL file. Like where exactly the comparison and decision part is happening.

There is no exact place where is it happening, because the code path depends on what is in OVAL object and state. But, the most important thing, values are being compared outside of the probe code (probes are only responsible for collecting entities from the system).

If you want to know what entities were collected by a probe, then --verbose=DEVEL argument to the scanner could help you.

evgenyz avatar May 28 '20 10:05 evgenyz

Hi,

My proposed method, where sKeyOperation is the operator="...":

if (0 == sKeyOperation.compare("pattern match") && 
        std::string::npos != sKey.find("^")) {

I limit my optimization to the case where it has a ^ at the beginning, otherwise its impossible for my code to work properly

I then call a recursive function called crawlRegistryKeyRegex and return the paths as a vector back:

std::vector <std::string> sRelevantKeyPaths;
sRelevantKeyPaths = crawlRegistryKeyRegex(hKey, "", sKey);

The function is defined as follows:

std::vector<std::string> crawlRegistryKeyRegex(HKEY hRootKey, std::string sBaseKey, std::string sKey)

And does runs as such:

std::vector<std::string> crawlRegistryKeyRegex(HKEY hRootKey, std::string sBaseKey, std::string sKey)
{
  std::vector<std::string> sPossibleKeys;

  const char delim[] = "\\\\";

  std::string sKeyLessBase = sKey.substr(sBaseKey.length());
  if (0 == sKeyLessBase.find("\\\\")) {
    sKeyLessBase = sKeyLessBase.substr(strlen("\\\\"));
  }

  std::vector <std::string> elements = split(sKeyLessBase, delim);

  // We need to find the first element has non-static values, i.e. has * . or [
  std::string staticKey = sBaseKey;
  auto it = elements.begin(); // We want to keep track of the position were we ended

  for (it = elements.begin();
    it != elements.end();
    it++) {
    std::string element = (*it);

    size_t posOfInterest = 0;
    bool bStatic = true;
    while (std::string::npos != (posOfInterest = element.find_first_of("*.[]$^()", posOfInterest))) {
      /* Make sure that the previous char is not a \ */
      bool bEscaped = false;
      if (posOfInterest > 0) {
        if (element[posOfInterest - 1] == '\\') {
          bEscaped = true;
          /* If it is a \\ make sure that the one before it is also not a \ */
          if (posOfInterest > 1) {
            if (element[posOfInterest - 2] == '\\') {
              bEscaped = false;
            }
          }
        }
      }

      if (bEscaped) {
        // not relevant its static
        posOfInterest++;
        continue;
      }
      else {
        // its non static
        bStatic = false;
        break;
      }
    }

    // it is static
    if (bStatic) {
      if (!staticKey.empty())
        staticKey.append("\\\\");

      staticKey.append(element);
    }
    else {
      break;
    }
  }

  std::string reminderKey = "";
  if (it != elements.end()) {
    for (auto itreminder = it;
      itreminder != elements.end();
      itreminder++) {
      std::string element = (*itreminder);

      if (!reminderKey.empty())
        reminderKey.append("\\\\");

      reminderKey.append(element);
    }
  }

  DWORD dwError = 0;
  std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
  std::wstring wide_static_key = converter.from_bytes(staticKey);

  // Lets open the base key location
  HKEY hkResult = NULL;
  if (ERROR_SUCCESS != (dwError = RegOpenKeyEx(hRootKey, wide_static_key.c_str(), 0, KEY_READ | KEY_WOW64_64KEY, &hkResult))) {
    // std::cerr << "RegOpenKey for " << staticKey << " failed error code: " << dwError << " @ " << __LINE__ << std::endl;
    return sPossibleKeys;
  }

#define MAX_KEY_LENGTH 255
#define MAX_VALUE_NAME 16383
  TCHAR    achKey[MAX_KEY_LENGTH] = { 0 }; // buffer for subkey name
  DWORD    cbName = 0;               // size of name string 
  TCHAR    achClass[MAX_PATH] = TEXT("");  // buffer for class name 
  DWORD    cchClassName = MAX_PATH;  // size of class string 
  DWORD    cSubKeys = 0;             // number of subkeys 
  DWORD    cbMaxSubKey = 0;          // longest subkey size 
  DWORD    cchMaxClass = 0;          // longest class string 
  DWORD    cValues = 0;              // number of values for key 
  DWORD    cchMaxValue = 0;          // longest value name 
  DWORD    cbMaxValueData = 0;       // longest value data 
  DWORD    cbSecurityDescriptor = 0; // size of security descriptor 
  FILETIME ftLastWriteTime;          // last write time 

  LSTATUS retCode = RegQueryInfoKey(
    hkResult,                // key handle 
    achClass,                // buffer for class name 
    &cchClassName,           // size of class string 
    NULL,                    // reserved 
    &cSubKeys,               // number of subkeys 
    &cbMaxSubKey,            // longest subkey size 
    &cchMaxClass,            // longest class string 
    &cValues,                // number of values for this key 
    &cchMaxValue,            // longest value name 
    &cbMaxValueData,         // longest value data 
    &cbSecurityDescriptor,   // security descriptor 
    &ftLastWriteTime);       // last write time 

  if (cSubKeys)
  {
    //std::cerr << "Number of subkeys: " << cSubKeys << std::endl;

    std::string pattern;

    size_t doubleSlash = std::string::npos;
    if (std::string::npos != (doubleSlash = reminderKey.find("\\\\"))) {
      // We need to take just the part that we are interested in
      pattern = reminderKey.substr(0, doubleSlash);
      reminderKey = reminderKey.substr(doubleSlash + strlen("\\\\"));
    }
    else {
      pattern = reminderKey;

      if (pattern.length() >= strlen(".*$") && 0 == pattern.substr(pattern.length() - strlen(".*$")).compare(".*$")) { // If it ends with a .*$ we need to return all the keys after us until "eof keys"
        reminderKey = ".*$";
      }
      else {
        reminderKey = "";
      }
    }
    boost::match_flag_type flags_version = boost::match_default;
    boost::match_results<std::string::const_iterator> results_version;

    boost::regex key_regex(pattern, boost::regex::icase);

    for (DWORD i = 0; i < cSubKeys; i++) {
      cbName = MAX_KEY_LENGTH;
      retCode = RegEnumKeyEx(hkResult, i,
        achKey, // hold the key name
        &cbName,
        NULL,
        NULL,
        NULL,
        &ftLastWriteTime);

      if (ERROR_SUCCESS == retCode) {
        std::wstring swachKey = achKey;
        std::string sachKey = converter.to_bytes(swachKey);
        // std::cerr << "(" << i + 1 << ") " << sachKey.c_str() << std::endl;

        std::string sSubKey = converter.to_bytes(achKey);

        if (boost::regex_search(sachKey, results_version, key_regex, flags_version)) {
          std::string sRelevantKey = staticKey;
          if (!sRelevantKey.empty())
            sRelevantKey.append("\\\\");
          sRelevantKey.append(sachKey);

          sPossibleKeys.insert(sPossibleKeys.end(), sRelevantKey);
        }
      }
    }
  }

  RegCloseKey(hkResult);

  if (0 == cSubKeys && 0 == reminderKey.compare(".*$")) { // We reached the end, no sub keys and our reminder is a catch all
    return sPossibleKeys;
  }

  if (!reminderKey.empty()) {
    // The reminder has something, we need to dive into it
    std::vector<std::string> sSubPossibleKeys;

    for (auto it = sPossibleKeys.begin();
      it != sPossibleKeys.end();
      it++) {
      std::string sSubKey = (*it);
      std::string sReminderKey = sSubKey;

      /*
      sReminderKey = ReplaceAll(sReminderKey, ".", "\\."); // Escape the regexs
      sReminderKey = ReplaceAll(sReminderKey, "$", "\\$"); // Escape the regexs
      sReminderKey = ReplaceAll(sReminderKey, "[", "\\["); // Escape the regexs
      sReminderKey = ReplaceAll(sReminderKey, "]", "\\]"); // Escape the regexs
      sReminderKey = ReplaceAll(sReminderKey, "?", "\\?"); // Escape the regexs
      sReminderKey = ReplaceAll(sReminderKey, "{", "\\{"); // Escape the regexs
      sReminderKey = ReplaceAll(sReminderKey, "}", "\\}"); // Escape the regexs
      sReminderKey = ReplaceAll(sReminderKey, "(", "\\("); // Escape the regexs
      sReminderKey = ReplaceAll(sReminderKey, ")", "\\)"); // Escape the regexs
      sReminderKey = ReplaceAll(sReminderKey, "*", "\\*"); // Escape the regexs
      */

      if (!sReminderKey.empty())
        sReminderKey += "\\\\";
      sReminderKey += reminderKey;

      std::vector <std::string> sMatchesFound = crawlRegistryKeyRegex(hRootKey, sSubKey, sReminderKey);
      sSubPossibleKeys.insert(sSubPossibleKeys.end(), sMatchesFound.begin(), sMatchesFound.end());
    }

    if (sSubPossibleKeys.empty() && 0 == reminderKey.compare(".*$")) {
      // If no match was found and the reminder is .*$ it measn we reached the bottom, return the keys we previously had
      return sPossibleKeys;
    }
    sPossibleKeys = sSubPossibleKeys;
  }

  return sPossibleKeys;
}

I have similar code that takes these paths found and then does a regex (if needed) on the name part

Let me know if you want to integrate it :) I can help review it if needed

nrathaus avatar May 28 '20 11:05 nrathaus

NOTE: There is the danger here of (very deep) recursion as someone can put "^.*" as a regex

I didn't handle this because I am using trusted sources for the OVAL consumption

Or huge vector of results because of a catch all regex

nrathaus avatar May 28 '20 11:05 nrathaus