sentry-native icon indicating copy to clipboard operation
sentry-native copied to clipboard

Invalid characters in breadcrumbs or event messages lead to silently discarded events

Open RedFox20 opened this issue 4 years ago • 6 comments

Description

If hand-crafted stack traces contain some specific invalid characters, sentry-native will not complain and events will disappear into the ether during CURL transfer.

Another similar error happened with breadcrumbs, where I accidentally wrote completely invalid characters into the string.

I suggest improving write_json_str function inside sentry_json.c to handle all potential invalid characters. For example, replace '&' with "&", and replace invalid utf8 characters with debug hex. This would help identify more serious data corruption bugs which are currently not visible.

The compiler also complains of a buffer overflow regarding snprintf(buf, sizeof(buf), "\\u%04x", *ptr); inside write_json_str function: note: 'snprintf' output 11 bytes into a destination of size 10. It should probably be snprintf(buf, sizeof(buf), "\\u%02x", (int)*ptr & 0xff );

When does the problem happen

  • [x] During build
  • [ X] During run-time
  • [ X] When capturing a hard crash

Environment

  • OS: CentOS Linux 7
  • Compiler: GCC 7.5.0, 64-bit
  • CMake version and config: 3.19.5, SENTRY_BACKEND=breakpad

Steps To Reproduce

Invalid character in breadcrumbs:

std::string invalid = "crumb";
invalid.back() = (char)0xff; // forcefully corrupt the string with an invalid char
sentry_value_t crumb = sentry_value_new_breadcrumb("info", invalid.c_str());
sentry_add_breadcrumb(crumb);

// this will never make it to sentry backend, it silently disappears because of the invalid char
sentry_capture_event(sentry_value_new_message_event(SENTRY_LEVEL_INFO, "custom", "event");

Invalid character in custom stacktrace:

void addStackTraceToEvent(sentry_value_t evt, void* callstack, int num_frames)
{
    sentry_value_t frames = sentry_value_new_list();
    for (int i = num_frames - 1; i >= 0; --i)
    {
        StackFrame sf = symbolize_frame(callstack[i]);
        char addr[64]; snprintf(addr, sizeof(addr), "0x%llx", (unsigned long long)sf.addr);
        
        // forcefully corrupt the stacktrace:
        sf.function[0] = '&'; // adding an ampersand is enough to make this event disappear

        sentry_value_t frame = sentry_value_new_object();
        sentry_value_set_by_key(frame, "function", sentry_value_new_string(sf.function));
        sentry_value_set_by_key(frame, "filename", sentry_value_new_string(sf.filename));
        sentry_value_set_by_key(frame, "lineno", sentry_value_new_int32(sf.lineno));
        sentry_value_set_by_key(frame, "instruction_addr", sentry_value_new_string(addr));
        sentry_value_append(frames, frame);
    }
    
    sentry_value_t stacktrace = sentry_value_new_object();
    sentry_value_set_by_key(stacktrace, "frames", frames);
    
    sentry_value_t thread = sentry_value_new_object();
    sentry_value_set_by_key(thread, "stacktrace", stacktrace);
    
    sentry_value_t values = sentry_value_new_list();
    sentry_value_append(values, thread);
    
    sentry_value_t threads = sentry_value_new_object();
    sentry_value_set_by_key(threads, "values", values);
    sentry_value_set_by_key(evt, "threads", threads);
}

sentry_value_t beforeSendEvent(sentry_value_t event, void* always_null, void* data)
{
    void* callstack[128];
    int num_frame = stack_trace(callstack, 128, 1);
    addStackTraceToEvent(event, stacktrace, num_frames);
    return event;
}

Log output

No error messages in the log. CURL POST is all OK. Event never appears in the server. Cannot share further details, since codebase is proprietary and running a self-hosted instance.

I have added my own workaround to sanitize all the strings passed into sentry, but this should definitely be fixed inside sentry to save other developers the hassle. It took me 2 weeks to figure out why the event reports were failing randomly.

RedFox20 avatar May 06 '21 22:05 RedFox20

This is my current workaround for reference:

    static size_t string_copy(char* buffer, size_t offset, size_t max, const char* source)
    {
        for (const char* src = source; *src && offset < max; ++src)
            buffer[offset++] = *src;
        return offset;
    }

    // replaces all invalid characters to prevent sentry events from disappearing
    static char* sanitized_string(const char* string, char* buffer, size_t max, bool allow_ampersand)
    {
        size_t out_len = 0;
        char ch = 0;
        for (const char* ptr = string; (ch = *ptr) != 0 && out_len < max; ++ptr)
        {
            if (!allow_ampersand && ch == '&' && strcmp(ptr, "&amp;") != 0)
            {
                out_len = string_copy(buffer, out_len, max, "&amp;");
            }
            else if ((unsigned char)ch > 127) // non-ascii character, disregard utf8 for now
            {
                char buf[12]; snprintf(buf, sizeof(buf), "\\u00%02x", ch & 0xff); // EDIT: using \\u00%02x instead
                out_len = string_copy(buffer, out_len, max, buf);
            }
            else
            {
                buffer[out_len++] = ch;
            }
        }
        buffer[out_len] = '\0';
        return buffer;
    }

    static sentry_value_t sanitized_value(const char* s)
    {
        char buffer[1024];
        return sentry_value_new_string(sanitized_string(s, buffer, sizeof(buffer), false));
    }

RedFox20 avatar May 06 '21 23:05 RedFox20

This is interesting, however I don’t really understand what might be failing here.

  1. JSON does not mandate that you html-escape ampersands.
  2. However, JSON mandates that unicode escapes have 4 digits, your method only uses 2 digits.

Related, you use an explicit & 0xff on an unsigned char. I wonder why that would be needed? In general, we assume strings to be valid JSON.

In any case, you mentioned that you are running a self-hosted sentry. Which version is that?

Can you maybe provide an example envelope that I can try to reproduce this?

Swatinem avatar May 07 '21 11:05 Swatinem

So I ran some tests with SaaS Sentry: Ampersands in values are supported just fine (tested with tags)

Bildschirmfoto 2021-05-07 um 13 33 27

And Sentry does accept 2-digit unicode escapes, which means it is more accepting than for example firefox:

Bildschirmfoto 2021-05-07 um 13 33 50

Swatinem avatar May 07 '21 11:05 Swatinem

Thanks for pointing that out, I will use "\\u00%02x" instead :)

We are running Sentry 21.3.1 7e5ca7e self-hosted

Can you test invalid character 0xff inside breadcrumbs and ampersand inside stacktrace function name please? Those were the two instances where I noticed the failures, perhaps it does not happen with tags.

I will post an example envelope as soon as possible.

RedFox20 avatar May 07 '21 11:05 RedFox20

Here is an example envelope with an invalid character in breadcrumb and also an ampersand in the stack trace. It does not appear in Sentry Issues. When I sanitize the strings, the issue appears correctly in Sentry Issues.

This is the code I used to generate this error (I also had to disable my internal sanitization):

        void reproduceInvalidReport(const std::string& ampersand_for_stacktrace)
        {
            using namespace mr::logging;
            reportBreadcrumb(Level::Info, "breadcrumb with invalid char\xff");
            reportEvent(Level::Error, "Error Event with Stack Trace");
        }

error.0.txt

RedFox20 avatar May 07 '21 12:05 RedFox20

This is my current workaround for reference:

    static size_t string_copy(char* buffer, size_t offset, size_t max, const char* source)
    {
        for (const char* src = source; *src && offset < max; ++src)
            buffer[offset++] = *src;
        return offset;
    }

    // replaces all invalid characters to prevent sentry events from disappearing
    static char* sanitized_string(const char* string, char* buffer, size_t max, bool allow_ampersand)
    {
        size_t out_len = 0;
        char ch = 0;
        for (const char* ptr = string; (ch = *ptr) != 0 && out_len < max; ++ptr)
        {
            if (!allow_ampersand && ch == '&' && strcmp(ptr, "&amp;") != 0)
            {
                out_len = string_copy(buffer, out_len, max, "&amp;");
            }
            else if ((unsigned char)ch > 127) // non-ascii character, disregard utf8 for now
            {
                char buf[12]; snprintf(buf, sizeof(buf), "\\u00%02x", ch & 0xff); // EDIT: using \\u00%02x instead
                out_len = string_copy(buffer, out_len, max, buf);
            }
            else
            {
                buffer[out_len++] = ch;
            }
        }
        buffer[out_len] = '\0';
        return buffer;
    }

    static sentry_value_t sanitized_value(const char* s)
    {
        char buffer[1024];
        return sentry_value_new_string(sanitized_string(s, buffer, sizeof(buffer), false));
    }

This fix worked for me in a codebase of mine recently. Thank you!

Very unfortunate that Sentry doesn't handle this itslef.

gdianaty avatar Jun 21 '22 07:06 gdianaty

This issue is stale. Closing. Please reopen if necessary.

jernejstrasner avatar Dec 06 '23 18:12 jernejstrasner