FUOTAv2 FragGetParityMatrixRow implementation incorrect?
I'm testing FUOTA v2 and on FragSessionStatusAns I'm getting MIC errors. What I'm testing:
- Uplink of multiple fragments with redundancy
- During the multicast downlinks I'm triggering a device uplink (pressing blue button of EVK board) such that it misses one or two multicast downlinks
- The device reports that after receiving the redundant packets reconstructed the payload
Current fragment index = 0,
Current fragment counter = 15,
Number of missed packets = 2,
FILE RECONSTRUCTS SUCCESSFULLY !
However, on FragSessionStatusReq, it prints that the MIC is incorrect. After printing the reconstructed payload vs the actual payload that was sent, I notice that the reconstructed bytes are incorrect (only the parts that were reconstructed using the FEC scheme). This might be an issue with my implementation or it might be an issue in the LBM firmware, that I do not know yet.
However, after looking at the FragGetParityMatrixRow function, I noticed there might be something missing:
static void FragGetParityMatrixRow( int32_t n, int32_t m, uint8_t* matrixRow )
{
int32_t mTemp;
int32_t x;
int32_t nbCoeff = 0;
int32_t r;
if( IsPowerOfTwo( m ) != false )
{
mTemp = 1;
}
else
{
mTemp = 0;
}
x = 1 + ( 1001 * n );
for( int32_t i = 0; i < ( ( m >> 3 ) + 1 ); i++ )
{
matrixRow[i] = 0;
}
while( nbCoeff < ( m >> 1 ) )
{
r = 1 << 16;
while( r >= m )
{
x = FragPrbs23( x );
r = x % ( m + mTemp );
}
if( GetParity( r, matrixRow ) == 0 )
{
SetParity( r, matrixRow, 1 );
nbCoeff += 1;
}
}
}
The specs contains an if (N <= M) line, which does not seem to be implemented:
An other observation, this function is called in the v2 implementation as:
// fragCounter - FragDecoder.FragNb
FragGetParityMatrixRow( fragCounter - FragDecoder.FragNb, FragDecoder.FragNb, matrixRow );
This is equal to the v1 implementation of the LBM.
However, the implementation seems to have changed between v1 and v2:
v1 implementation
CODED_F is an array which is 2 * w, where w is the number of uncoded fragments (thus redundancy is equal to w).
There are two for loops:
- Adding the
UNCODED_Ffragments toCODED_F - Adding the coded fragments to
CODED_Fnote that the variableyis in the range from 1 .. w
v2 implementation
CODED_F is an array which is 2 * w, where w is the number of uncoded fragments (thus again the redundancy is equal to w).
There is only 1 loop where the variable y is in the range of 1 .. (2 * w)
Now looking again at:
// fragCounter - FragDecoder.FragNb
FragGetParityMatrixRow( fragCounter - FragDecoder.FragNb, FragDecoder.FragNb, matrixRow );
I think this is not correct, it should probably be just fragCounter as if I understand the MATLAB code correctly, the expected range is 1 .. (number of fragments + redundant fragments).
The LBM implementation is a direct translation of the MATLAB code. Let me explain:
Handling of Fragments
In the LBM implementation, the uncoded fragments are handled separately, and only the coded fragments (where N > M) are managed by calling the function FragGetParityMatrixRow. That is why this function does not check the case N <= M, unlike the MATLAB code.
MATLAB Specification
In the specification, the MATLAB code for both V1 and V2 is essentially the same, just written in two different ways. The core idea remains: the output is the CODED_F fragments, where the first M fragments are equal to UNCODED_F, and the following M fragments are a combination of those uncoded fragments. Thus, the result is:
CODED_F = [UNCODED_F CODED_F]
In the V1 MATLAB Implementation:
CODED_Fdirectly copiesUNCODED_Ffor the first M fragments.- Then, lines 697 to 705 handle the second set of M fragments by combining the uncoded fragments.
In the V2 MATLAB Implementation:
- There is no direct copy of
UNCODED_F. Instead, it is done implicitly because the combination matrix is an identity matrix for the first M fragments, resulting inCODED_F = UNCODED_Ffor those fragments. - The following M fragments are also a combination of the uncoded fragments, since the matrix is no longer the identity in that portion.
Summary
Both V1 and V2 behave identically; the only difference is in the way the combination matrix is constructed—V2 includes more ones in the matrix compared to V1.
I also believe that the LBM implementation exhibits the same behavior.
In the specification, the MATLAB code for both V1 and V2 is essentially the same, just written in two different ways.
Are you sure? I think the x variable is calculated different because the arguments to matrix_line are different in v1 and v2:
x= 1+1001*N; %initialize the seed differently for each line
In the v1 MATLAB example:
- The MATLAB code iterates twice over the range
1:w(twofor y=1:wloops) - For calculating the first redundant fragment, the
matrix_linefunction is called with the following argumentsmatrix_line(1, 32) - This means
N=1, M=32, andxis calculated asx=1+1001*1
In v2:
- The MATLAB code iterates once over the range
1:w*2(onefor y=1:w*2loop) - If
N <= M,matrix_linereturns early - If
N > M,matrix_linegenerates the parity check vector - This means that for calculating the first redundancy fragment, the
matrix_linefunction is called with the following argumentsmatrix_line(33, 32) - This means
N=33, M=32, andxis calculated asx=1+1001*33
I think in the LBM stack, this line:
FragGetParityMatrixRow( fragCounter - FragDecoder.FragNb, FragDecoder.FragNb, matrixRow );
Should therefore be something like this for v2:
FragGetParityMatrixRow( fragCounter, FragDecoder.FragNb, matrixRow );
We just encountered the same issue.
Our devices use LBM and we use the FEC at two placed:
- FUOTA as part of LBM
- fragmented uplink (our custom protocol)
For both we use the LBM code. Customers can use our fragmentation service as part of SolidRed, or can implement their own based on documentation. For customers custom implementation we refer to the TS004 documentation and a customer just discovered there is a discrepancy as @brocaar mentioned. In SolidRed we have the same implementation as LBM and therefore we don't encounter the problem there.
We now have a practical problem that we already produced a few thousand devices with the LBM implementation. A fix in the library would cause a breaking change between old and newly produced devices (or after FUOTA).
The question is now: which way should be leading? In theory the the TS004 should be leading of course, and LBM is "just" an implementation that should follow the spec. But on the other hand, practically LBM (as successor of the LoRaMac-node) is "the" reference code for many devices. Is there anything against on updating the spec to match the LBM implementation? How many implementation are there already using the correct TS004 implementation?
We are ok with both, as long as it is well though of. We just wanted to share the impact of the fix change for us as a device maker.
Hi @lbm-team, is there any update?
Just to give a quick update: it is recognized in the corresponding LA work group and they are debating on the best resolution. No idea what stops them from saying something along those lines here.