FreeRTOS-Kernel icon indicating copy to clipboard operation
FreeRTOS-Kernel copied to clipboard

Add check for empty list to listGET_OWNER_OF_NEXT_ENTRY

Open archigup opened this issue 2 years ago • 5 comments

Add check for empty list to listGET_OWNER_OF_NEXT_ENTRY

Description

listGET_OWNER_OF_NEXT_ENTRY uses uninitialized memory if the passed list is empty (only has the list end marker). While all uses of this macro are nested within a list empty check or are after an assert, this trips up GCC post GCC 11 on O3. Worse yet, with minilist enabled, this macro will dereference a struct offset past the end of the struct when passed an empty list.

Adding this check makes the behavior of the macro on empty list explicit when the only element is xListEnd, and prevents accessing xListEnd->pvOwner (Which is currently NULL where the List_t is zero-initialized; I have not checked if there are cases where it is not).

GCC gives a warning as it catches that the macro does not work with an empty list, but it does not figure that that code path is unreachable. Adding this check or adding more list non-empty checks/asserts directly outside of the macro fixes the issue. With asserts, the issue is only solved when asserts are enabled, so just adding the check is the better solution.

Test Steps

Compile tasks.h with GCC 12 (or 11) and with 03 as follows:

gcc -c tasks.c -I include -I ${KERNEL_CONFIG_PATH}/ -I portable/ThirdParty/GCC/Posix/ -Wall -Wextra -Werror -O3

Related Issue

https://github.com/FreeRTOS/FreeRTOS/issues/858

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

archigup avatar Oct 21 '22 00:10 archigup

Codecov Report

Base: 94.30% // Head: 94.30% // No change to project coverage :thumbsup:

Coverage data is based on head (eb4819d) compared to base (91927ab). Patch has no changes to coverable lines.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #580   +/-   ##
=======================================
  Coverage   94.30%   94.30%           
=======================================
  Files           6        6           
  Lines        2370     2370           
  Branches      579      579           
=======================================
  Hits         2235     2235           
  Misses         85       85           
  Partials       50       50           
Flag Coverage Δ
unittests 94.30% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov[bot] avatar Oct 21 '22 00:10 codecov[bot]

The update adds an additional test into listGET_OWNER_OF_NEXT_ENTRY(), which is called from within taskSELECT_HIGHEST_PRIORITY_TASK(), so gets called at high frequency, including potentially in the tick interrupt. Anything that extends the runtime of high frequency functions needs additional scrutiny, hence I would like to understand this better before merging.

I see the code will assert if the subject list is empty – and I don’t think I’ve seen that assert get hit in cases other than where portUSE_PORT_OPTIMISED_TASK_SELECTION is 0 and the lists are already corrupt. So, if my understanding is correct, the update duplicates code. On first look, it seems the update is only required to work around a compiler bug (the hardest bugs to find, so somebody did a good job here!), in which case we need to report the bug if it is not already being tracked? Or is the compiler’s behaviour legitimate? The update impacts all compilers, so if it is a compiler bug work around, will we take it out when the compiler is fixed?

RichardBarry avatar Oct 21 '22 01:10 RichardBarry

Have we confirmed if this is actually a bug in the compiler? Because if it is not that changes things a lot.

cobusve avatar Oct 21 '22 04:10 cobusve

As archigup says, without knowing the execution path, the compiler warning is probably legitimate at as a static check (at any optimisation level). I would not expect there to be an actual error condition at run-time though.

RichardBarry avatar Oct 21 '22 17:10 RichardBarry

I am currently minimizing the testcase to evaluate whether it is a compiler bug.

archigup avatar Oct 21 '22 17:10 archigup

I've minimized the test case to this:

typedef struct xLIST_ITEM
{
    struct xLIST_ITEM * pxNext;
    struct xLIST * pvContainer;
} ListItem_t;

typedef struct xMINI_LIST_ITEM
{
    struct xLIST_ITEM * pxNext;
} MiniListItem_t;

typedef struct xLIST
{
    int uxNumberOfItems;
    ListItem_t * pxIndex;
    MiniListItem_t xListEnd;
} List_t;

static List_t xSuspendedTaskList;

static void prvSearchForNameWithinSingleList( List_t * pxList )
{
    if( ( ( pxList )->uxNumberOfItems ) > 0 )
    {
        pxList->pxIndex = pxList->pxIndex->pxNext;
        if( ( void * ) pxList->pxIndex == ( void * ) &( pxList->xListEnd ) )
        {
            pxList->pxIndex = pxList->pxIndex->pxNext;
        }
    }
}

void xTaskGetHandle( void )
{
    prvSearchForNameWithinSingleList( &xSuspendedTaskList );
}

This makes it clear that the fix in this PR is incidental; the warning can be reproduced without that last assignment in listGET_OWNER_OF_NEXT_ENTRY.

I am also able to trigger it on 02 in addition to 03 and 0s with this. The warning also goes away when configUSE_MINI_LIST_ITEM is 0.

archigup avatar Nov 30 '22 23:11 archigup

https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Warray-bounds

Looking at above gcc docs explains why it only happens at those optimization levels; the warning depends on -ftree-vrp which is not on all optimization levels. I am able to reproduce the warning at -O1 when -ftree-vrp is enabled, and not able to reproduce with -O3 -fno-tree-vrp.

Still odd since it seems it should fall into -Warray-bounds=2 but its happening with -Warray-bounds=1.

archigup avatar Nov 30 '22 23:11 archigup

Narrowing it down further, it seems the issue really is in:

if( ( void * ) pxList->pxIndex == ( void * ) &( pxList->xListEnd ) )
{
    pxList->pxIndex = pxList->pxIndex->pxNext;
}

In the right side of the assignment pxList->pxIndex is a ListItem_t *, but it is known that it points to pxList->xListEnd due to the surrounding check; accessing xListEnd through a ListItem_t * does extend past the memory region that the specific calls of prvSearchForNameWithinSingleList pass in as the pxList parameter. This explains why its only warning where the pxList param is actually a global var. This can be fixed by not accessing it through a type that would extend outside of the memory region. The following equivalent code works:

if( ( void * ) pxList->pxIndex == ( void * ) &( pxList->xListEnd ) )
{
    pxList->pxIndex = pxList->xListEnd.pxNext;
}

Due to the if condition, those two are the same value, but the latter code instead accesses it as a MiniListItem_t which is in bounds.

The following code also alternatively works:

if( ( void * ) pxList->pxIndex == ( void * ) &( pxList->xListEnd ) )
{
    pxList->pxIndex = ( ( MiniListItem_t * ) pxList->pxIndex )->pxNext;
}

archigup avatar Dec 01 '22 00:12 archigup