SWL2001 icon indicating copy to clipboard operation
SWL2001 copied to clipboard

FUOTAv2 FragGetParityMatrixRow implementation incorrect?

Open brocaar opened this issue 9 months ago • 7 comments

I'm testing FUOTA v2 and on FragSessionStatusAns I'm getting MIC errors. What I'm testing:

  • Uplink of multiple fragments with redundancy
  • During the multicast downlinks I'm triggering a device uplink (pressing blue button of EVK board) such that it misses one or two multicast downlinks
  • The device reports that after receiving the redundant packets reconstructed the payload
Current fragment index = 0,
Current fragment counter = 15,
Number of missed packets = 2,
 FILE RECONSTRUCTS SUCCESSFULLY !

However, on FragSessionStatusReq, it prints that the MIC is incorrect. After printing the reconstructed payload vs the actual payload that was sent, I notice that the reconstructed bytes are incorrect (only the parts that were reconstructed using the FEC scheme). This might be an issue with my implementation or it might be an issue in the LBM firmware, that I do not know yet.

However, after looking at the FragGetParityMatrixRow function, I noticed there might be something missing:

static void FragGetParityMatrixRow( int32_t n, int32_t m, uint8_t* matrixRow )
{
    int32_t mTemp;
    int32_t x;
    int32_t nbCoeff = 0;
    int32_t r;

    if( IsPowerOfTwo( m ) != false )
    {
        mTemp = 1;
    }
    else
    {
        mTemp = 0;
    }

    x = 1 + ( 1001 * n );
    for( int32_t i = 0; i < ( ( m >> 3 ) + 1 ); i++ )
    {
        matrixRow[i] = 0;
    }
    while( nbCoeff < ( m >> 1 ) )
    {
        r = 1 << 16;
        while( r >= m )
        {
            x = FragPrbs23( x );
            r = x % ( m + mTemp );
        }
        if( GetParity( r, matrixRow ) == 0 )
        {
            SetParity( r, matrixRow, 1 );
            nbCoeff += 1;
        }
    }
}

The specs contains an if (N <= M) line, which does not seem to be implemented:

Image

brocaar avatar Mar 14 '25 12:03 brocaar

An other observation, this function is called in the v2 implementation as:

        // fragCounter - FragDecoder.FragNb
        FragGetParityMatrixRow( fragCounter - FragDecoder.FragNb, FragDecoder.FragNb, matrixRow );

This is equal to the v1 implementation of the LBM.

However, the implementation seems to have changed between v1 and v2:

v1 implementation

Image

CODED_F is an array which is 2 * w, where w is the number of uncoded fragments (thus redundancy is equal to w). There are two for loops:

  1. Adding the UNCODED_F fragments to CODED_F
  2. Adding the coded fragments to CODED_F note that the variable y is in the range from 1 .. w

v2 implementation

Image

CODED_F is an array which is 2 * w, where w is the number of uncoded fragments (thus again the redundancy is equal to w). There is only 1 loop where the variable y is in the range of 1 .. (2 * w)

Now looking again at:

        // fragCounter - FragDecoder.FragNb
        FragGetParityMatrixRow( fragCounter - FragDecoder.FragNb, FragDecoder.FragNb, matrixRow );

I think this is not correct, it should probably be just fragCounter as if I understand the MATLAB code correctly, the expected range is 1 .. (number of fragments + redundant fragments).

brocaar avatar Mar 14 '25 12:03 brocaar

The LBM implementation is a direct translation of the MATLAB code. Let me explain:

Handling of Fragments

In the LBM implementation, the uncoded fragments are handled separately, and only the coded fragments (where N > M) are managed by calling the function FragGetParityMatrixRow. That is why this function does not check the case N <= M, unlike the MATLAB code.

MATLAB Specification

In the specification, the MATLAB code for both V1 and V2 is essentially the same, just written in two different ways. The core idea remains: the output is the CODED_F fragments, where the first M fragments are equal to UNCODED_F, and the following M fragments are a combination of those uncoded fragments. Thus, the result is:

CODED_F = [UNCODED_F CODED_F]

In the V1 MATLAB Implementation:

  • CODED_F directly copies UNCODED_F for the first M fragments.
  • Then, lines 697 to 705 handle the second set of M fragments by combining the uncoded fragments.

In the V2 MATLAB Implementation:

  • There is no direct copy of UNCODED_F. Instead, it is done implicitly because the combination matrix is an identity matrix for the first M fragments, resulting in CODED_F = UNCODED_F for those fragments.
  • The following M fragments are also a combination of the uncoded fragments, since the matrix is no longer the identity in that portion.

Summary

Both V1 and V2 behave identically; the only difference is in the way the combination matrix is constructed—V2 includes more ones in the matrix compared to V1.

I also believe that the LBM implementation exhibits the same behavior.

lbm-team avatar Mar 27 '25 13:03 lbm-team

In the specification, the MATLAB code for both V1 and V2 is essentially the same, just written in two different ways.

Are you sure? I think the x variable is calculated different because the arguments to matrix_line are different in v1 and v2:

x= 1+1001*N; %initialize the seed differently for each line

In the v1 MATLAB example:

  • The MATLAB code iterates twice over the range 1:w (two for y=1:w loops)
  • For calculating the first redundant fragment, the matrix_line function is called with the following arguments matrix_line(1, 32)
  • This means N=1, M=32, and x is calculated as x=1+1001*1

In v2:

  • The MATLAB code iterates once over the range 1:w*2 (one for y=1:w*2 loop)
  • If N <= M, matrix_line returns early
  • If N > M, matrix_line generates the parity check vector
  • This means that for calculating the first redundancy fragment, the matrix_line function is called with the following arguments matrix_line(33, 32)
  • This means N=33, M=32, and x is calculated as x=1+1001*33

brocaar avatar Mar 27 '25 13:03 brocaar

I think in the LBM stack, this line:

FragGetParityMatrixRow( fragCounter - FragDecoder.FragNb, FragDecoder.FragNb, matrixRow );

Should therefore be something like this for v2:

FragGetParityMatrixRow( fragCounter, FragDecoder.FragNb, matrixRow );

brocaar avatar Mar 27 '25 14:03 brocaar

We just encountered the same issue.

Our devices use LBM and we use the FEC at two placed:

  • FUOTA as part of LBM
  • fragmented uplink (our custom protocol)

For both we use the LBM code. Customers can use our fragmentation service as part of SolidRed, or can implement their own based on documentation. For customers custom implementation we refer to the TS004 documentation and a customer just discovered there is a discrepancy as @brocaar mentioned. In SolidRed we have the same implementation as LBM and therefore we don't encounter the problem there.

We now have a practical problem that we already produced a few thousand devices with the LBM implementation. A fix in the library would cause a breaking change between old and newly produced devices (or after FUOTA).

The question is now: which way should be leading? In theory the the TS004 should be leading of course, and LBM is "just" an implementation that should follow the spec. But on the other hand, practically LBM (as successor of the LoRaMac-node) is "the" reference code for many devices. Is there anything against on updating the spec to match the LBM implementation? How many implementation are there already using the correct TS004 implementation?

We are ok with both, as long as it is well though of. We just wanted to share the impact of the fix change for us as a device maker.

martinichka avatar May 08 '25 10:05 martinichka

Hi @lbm-team, is there any update?

brocaar avatar Aug 11 '25 13:08 brocaar

Just to give a quick update: it is recognized in the corresponding LA work group and they are debating on the best resolution. No idea what stops them from saying something along those lines here.

StevenCellist avatar Sep 16 '25 10:09 StevenCellist