Buffer overflow (?) in esp_modem (IDFGH-14874)
Answers checklist.
- [x] I have read the documentation for esp-protocols components and the issue is not addressed there.
- [x] I have updated my esp-protocols branch (master or release) to the latest version and checked that the issue is present there.
- [x] I have searched the issue tracker for a similar issue and not found a similar issue.
General issue report
I have used esp_modem in several projects and found it to generally work well with various modem modules. I am now bringing up new hardware which consists of an ESP32-P4 and a Quectel FCMA62N dual band Wi-Fi modem (running custom firmware to provide the PPP interface).
This modem functions as expected in all respects except one: when I send it the AT+QGETIP command, the response is truncated.
This is the normal response when using Realterm to send the command:
AT+QGETIP="station"\r\n
AT+QGETIP="station"\r\r\n
+QGETIP: "FE80::7EE7:12FF:FE53:8080","2403:xxxx:xxxx:4:7EE7:12FF:FE53:8080"\r\n\r\n
+QGETIP: "192.168.60.153","192.168.60.1","255.255.255.0","192.168.60.1"\r\n\r\n
OK\r\n
and this is the response when using esp_modem:
D (22890) WIFIDRVR: TxModem(): attempt 3 of 3 sending entry 2: [AT+QGETIP="station"] to modem...
D (22890) uart-tx: 0x4ff3aad8 41 54 2b 51 47 45 54 49 50 3d 22 73 74 61 74 69 |AT+QGETIP="stati|
D (22896) uart-tx: 0x4ff3aae8 6f 6e 22 0d 0a |on"..|
D (22914) uart-rx: 0x4ff3f5b4 41 54 2b 51 47 45 54 49 50 3d 22 73 74 61 74 69 |AT+QGETIP="stati|
D (22914) uart-rx: 0x4ff3f5c4 6f 6e 22 0d |on".|
D (22971) uart-rx: 0x4ff3f5c8 0d 0a 2b 51 47 45 54 49 50 3a 20 22 46 45 38 30 |..+QGETIP: "FE80|
D (22972) uart-rx: 0x4ff3f5d8 3a 3a 37 45 45 37 3a 31 32 46 46 3a 46 45 35 33 |::7EE7:12FF:FE53|
D (22978) uart-rx: 0x4ff3f5e8 3a 38 30 38 30 22 2c 22 32 34 30 33 3a xx xx xx |:8080","2403:xxx|
D (22987) uart-rx: 0x4ff3f5f8 xx 3a xx xx xx xx 3a 34 3a 37 45 45 37 3a 31 32 |x:xxxx:4:7EE7:12|
D (22997) WIFIDRVR: ProcessCommand(): dce-command() returned OK, calling rx handler
W (23003) WIFIDRVR: IPHandler(): no ok
It is not simply a matter of not waiting long enough, the modem has sent the full response at this point.
There are about 105 (in this case) bytes missing from the end of the returned response, but a maximum length response would be 229 bytes:
AT+QGETIP="station"\r\n
AT+QGETIP="station"\r\r\n
+QGETIP: "FFFF:FFFF:FFFF:FFFF:FFFF:FFFF:FFFF:FFFF","FFFF:FFFF:FFFF:FFFF:FFFF:FFFF:FFFF:FFFF"\r\n\r\n
+QGETIP: "255.255.255.255","255.255.255.255","255.255.255.255","255.255.255.255"\r\n\r\n
OK\r\n
I have tried increasing the DTE buffer sizes:
dte_config.dte_buffer_size
dte_config.uart_config.rx_buffer_size
to no effect.
I have also tried setting the sdkconfig option for the inflatable DCE buffer, and also increased the C-API response buffer size, neither helped.
I believe there is a ring buffer buried in the innards of esp_modem which is filling, causing this issue. I do not see any buffer overflow events, so that's a little odd.
While the AT+QGETIP command is not absolutely essential for our device to achieve its design purpose, it is highly desirable from an operations point of view, as it will allow us to log the global addresses being used by the module.
Can you please either make it possible for me to increase the size of this buffer to 230 bytes, or tell me where the size is set so I can do it, as I have been unable to find it.
Thank you.
Hi @ntremble-sst
I don't think this is an issue with buffer size. It looks more like that the modem receives a chunk of data and assumes that it's the entire response (probably containing the \r\n from the command echo?).
How do you send this command AT+QGETIP?, which API you used, could you share the implementation of this command?
Hi @david-cermak
I have tables containing command prototypes, the one containing this command is:
static const stModemTable_t FCMA62N[] =
{
//command response timeout attempts txfn ... },
{ "AT+QSTAAPINFO=\"%s\",\"%s\"%s", "", 1000, 3, PFN(MkApInfo), ... },
{ "AT+QSTAST", "+QSTAST: \"STATION_UP", 1000, 60, PFN(TxModem), ... },
{ "AT+QGETIP=\"station\"", "+QGETIP:", 1000, 10, PFN(TxModem), ... },
{ "", "", 0, 0, nullptr, ... },
};
To send a command the prototype is processed by the transmit function, which in turn calls SendCommand(), which internally calls dce->command(). As you can see from the table above, AT+QGETIP is sent via TxModem(), which does no alteration except adding the trailing CRLF. The buffer modemrxbuffer is statically declared at 4096 bytes.
//***************************************************************************
// Send a command and/or data to the modem.
/// This is the standard transmit function, to be uses as the TxFunction in
/// any command entry that does not require special treatment.
//***************************************************************************
static enReturnCode_t TxModem( const _stModemTable_t * ptable, uint16_t * pentry )
{
enReturnCode_t rc = enReturnCode_Failure;
std::string cmd;
// ensure ptable has something to send
if ( ( strlen( ptable[*pentry].ucCommand ) > 0 ) )
{
// convert table entry to std::string, append CRLF
cmd.assign( ptable[*pentry].ucCommand );
cmd.append( "\r\n" );
// send the command to the modem
if( ( rc = SendCommand( cmd, ptable[*pentry].uiTimeout ) ) != enReturnCode_Success )
{
TRACE_ERROR( "failed sending command to modem" );
rc = enReturnCode_Failure;
}
}
else
{
TRACE_ERROR( "command table entry is incomplete" );
}
return rc;
}
//***************************************************************************
// Helper function: send a command to the modem.
//***************************************************************************
static enReturnCode_t SendCommand( std::string &cmd, uint32_t timeout )
{
enReturnCode_t rc = enReturnCode_Failure;
// ensure we have something to send
if ( cmd.empty() == false )
{
// send it to the modem
if( dce->command( (const std::string)cmd, my_line_cb, timeout ) == command_result::OK )
{
rc = enReturnCode_Success;
}
else
{
rc = enReturnCode_Failure;
}
}
return rc;
}
//***************************************************************************
// Callback function used by SendCommand() to capture responses.
//***************************************************************************
static esp_modem::command_result my_line_cb( unsigned char * puc, unsigned int len )
{
esp_modem::command_result result = command_result::FAIL;
if( ( puc != nullptr ) && ( len > 0 ) )
{
memcpy( modemrxbuffer, puc, len );
result = command_result::OK;
}
return result;
}
This scheme allows us to support multiple modems with differing command sets in the one set of firmware i.e. we can run a single firmware version over multiple board revisions where the modem may have changed.
Originally, I was configuring the UART myself, and using my own RX/TX functions up to the point where the modem enters data mode, when I was destroying my own driver and intialising the esp_modem one, but I figured that was silly, so now I initialise the esp_modem driver early, then use it to send all the commands up to and including data mode. This works well for everything except this one command, where the response is really long.
So although I have added some specific commands for the FCMA62N to esp_modem, this is not one of them; I am simply using the DCE to send a generic command, and capturing the response via the callback.
For completeness, the commands I did add are:
command_result set_echo_fcma62(CommandableIf *t, bool echo)
{
ESP_LOGV(TAG, "%s", __func__ );
if( echo )
{
return generic_command_common( t, "AT+QECHO=1\r\n" );
}
return generic_command_common( t, "AT+QECHO=0\r\n" );
}
command_result set_pdp_context_fcma62(CommandableIf *t, esp_modem::PdpContext &pdp)
{
return command_result::OK; // Wi-Fi does not require an APN
}
command_result set_data_mode_fcma62(CommandableIf *t)
{
ESP_LOGV(TAG, "%s", __func__ );
return generic_command(t, "AT+QPASSTHROUGH\r\n", "CONNECT", "ERROR", 1000);
}
command_result power_down_fcma62(CommandableIf *t)
{
ESP_LOGV(TAG, "%s", __func__ );
return generic_command(t, "AT+QPOWD=1\r\n", "POWER DOWN", "ERROR", 1000);
}
but these (with the possible exception of QPOWD, which Quectel are still to implement) all function correctly, so I don't believe my issue is due to this.
PS I am on leave, so may not respond immediately.
So, if I read the code correctly, you send the command and get the response in one go, returning either command_result::OK or command_result::FAIL from the callback (which works fine for commands that return the entire response in one go).
You can check the implementation of the generic command:
https://github.com/espressif/esp-protocols/blob/3fc26a5e5ce283cd7c94ddf25e79b8fa381cb341/components/esp_modem/src/esp_modem_command_library.cpp#L25-L39
This lambda here returns command_result::TIMEOUT unless an expected response appears (typically OK or ERROR or timeout hits), so if the response comes in multiple chunks it'd just "try again".
Have you tried to use the generic-command to implement it? For example:
command_result get_station_ip(CommandableIf *t)
{
return generic_command(t, "AT+QGETIP=\"station\"", "OK", "ERROR", 1000); // <- use "OK" instead of "+QGETIP:" (in the command echo)
}
No, I have not. I will try that and let you know how it goes. It will be a couple of weeks before I can try as I will be away.
But: the modem provides partial responses (no OK) until all data is available. At that point it returns the full response with the OK.
What I see is the response being truncated midway through an address, which the modem does not do. It returns complete addresses or nothing. So I might expect to see:
AT+QGETIP="station"\r\n
AT+QGETIP="station"\r\r\n
+QGETIP: "FE80::7EE7:12FF:FE53:8080","2403:xxxx:xxxx:4:7EE7:12FF:FE53:8080"\r\n\r\n
Then, if I send the command again:
AT+QGETIP="station"\r\n
AT+QGETIP="station"\r\r\n
+QGETIP: "FE80::7EE7:12FF:FE53:8080","2403:xxxx:xxxx:4:7EE7:12FF:FE53:8080"\r\n\r\n
+QGETIP: "192.168.60.153","192.168.60.1","255.255.255.0","192.168.60.1"\r\n\r\n
OK\r\n
But never a partial address, and never a partial line i.e. there should always be a CRLF at the end of the data.
This is why I thought the component might be truncating the response.
I will test your suggestion to use the generic command.
It could be just a small delay between characters for the UART interrupt to fire and give us the partial response. It's been a design decision that the esp-modem layers would have to expect the response to come in chunks, as the underlying terminal (UART or USB) might pass it fragmented due to timing or buffer constraints.
Hi David,
It appears to be much simpler than buffer overflow, or even timeouts.
Calls of the form:
if( dce->command( (const std::string)cmd, my_line_cb, timeout ) == command_result::OK )
return the complete response, while calls to:
result = esp_modem::dce_commands::generic_get_string( dte.get(), cmd, out, timeout );
deliberately truncate the response by chopping off the trailing OK, as well as intermediate line breaks. That may make sense within esp-modem, but it broke my response handler: since my code expects to see all the line breaks and the OK at the end of the response, this causes validation to fail. Please note the particular response is a multi-line one, so it does contain embedded line breaks. It is not absolutely necessary that those breaks remain, as the lines may be deduced by other means, but it's just a special case that doesn't really need to be there.
I updated to esp-modem 1.4.0 (was on 1.3.0) to no real effect. I then did some messing around in generic_get_string() and proved it to be the culprit. My code now works with the unmodified generic_get_string(), with the proviso that I need to treat that one command as a special case by not expecting OK. That's easy to do with my command tables, but it does mean the handler must blindly trust that generic_get_string() did indeed find the OK and remove it, as indicated by a command_result::OK return value.
If you have a suggestion as to how I can avoid this special handling by e.g. using a different API then that would be great, but as it stands generic_get_string() appears to be the only way I can send this particular command and retrieve the response.
The 'generic_command()' API does not do it for me because it does not return the modem response via the callback.
Thanks for all your help, and for creating and maintaining this library :-)