arduino-esp32 icon indicating copy to clipboard operation
arduino-esp32 copied to clipboard

UART Performance issues with 2.0.2 version

Open handmade0octopus opened this issue 2 years ago • 44 comments

Board

Not board specific

Device Description

Not board specific

Hardware Configuration

Not board specific

Version

v2.0.2

IDE Name

PlatformIO 4.2.0

Operating System

Windows 10

Flash frequency

80MHz

PSRAM enabled

no

Upload speed

921600

Description

I have been using Platformio 3.5.0 without any issues for very long time.

I tried to moving on to version 4.2.0 based off 2.0.2 but it seems it has lots of underlying problems or it requires me to rewrite most of my code.

I can't share all of it as its quite advanced, instead I will provide you with few insights and code that had dramatic performance issues. I run all code on mostly tasks with priority 4-8 and less important ones being on 3 and 2.

Top is version 3.5.0 of Platformio (I don't know exactly which Arduino version it's based off - sorry). Bottom is version 4.2.0 Arduino Platformio 3.5.0 (top) to 4.2.0 (bottom) comparision

Note - I left device to run for 5 minutes to make sure test is fair.

Displays are as follows (starting from left top corner):

  • Log speed - is .CSV logging appending the file with ~30 parameters, speed is in rows/second - as you can see on 3.5.0 it was stable and flat and it's jittery on new version (but it does keep up.

  • Max heap - I wouldn't worry about this one, but we can see change in max heap and it's much smaller than on previous version.

  • RPS - it's "responses per second" - basically I use single wire serial connection (LIN type) to gather data from old typ ECU, here my performance is most important and as you can see it doesn't keep up with it. Maybe my code here is wrong as it's very complicated to provide great stability, but I have been using this one for few years and never has issues before.

It's worth mentioning I am using GPS at 5Hz data rate constantly sending data to my second Serial input.

  • Free heap - here difference is huge and I can't believe it's only due to change from 3.5.0 to 4.2.0 - but I can't find any other culprit!

But the worst offender of all is the part of sketch below that I use to transfer data from/to SD card. Below is snippet of function I use to get those performance stats:

Before I managed to get constantly around 71 ms:

[E][Gauge.S.cpp:1536] handleCopyFile(): Time to read /displayConfig.json -     73 ms

But now it's not even funny:

[E][Gauge.S.cpp:1536] handleCopyFile(): Time to read /displayConfig.json -   1466 ms

I think SD card handling may have changed so some sort of guide of how to move to new type of handling files would be appreciated.

Sketch

#define SD_CS 15
#define SD_SCK 14
#define SD_MISO 27
#define SD_MOSI 13
SPIClass SDSPI(HSPI);

// In setup()
void setup() {
	SDSPI.begin(SD_SCK, SD_MISO, SD_MOSI, SD_CS);
	SD.begin(SD_CS, SDSPI, 80000000);
}

// Yes I could have written it better
// It basically transfers files from SD card to flash and vice versa with usage of temp.dat file in case of power failure.

#define COPY_BUFFER 512
#define TEMP_NAME "/temp.dat"
boolean handleCopyFile(const char* nameSource, const char* nameTarget, boolean overwrite = true, boolean toSDCard = false, boolean removeSource = false) {
	#ifdef DEBUG
		uint32_t fileWriteStamp = millis();
	#endif
	uint8_t bufFile[COPY_BUFFER];
	boolean sourceExist;
	boolean targetExist;
	if(toSDCard) {
		sourceExist = SPIFFS.exists(nameSource);
		targetExist = SD_USE.exists(nameTarget);
	} else {
		sourceExist = SD_USE.exists(nameSource);
		targetExist = SPIFFS.exists(nameTarget);
	}
	if(!overwrite) overwrite = !targetExist;
	if(sdReady && sourceExist && overwrite) {
		File sourceFile;
		if(toSDCard) sourceFile = SPIFFS.open(nameSource, "r");
		else sourceFile = SD_USE.open(nameSource, "r");
		if(!sourceFile) return false;
		boolean renamed = false;
		if(targetExist) {
			if(toSDCard) {
				// SD_USE.remove(nameTarget);
			} else {
				renamed = SPIFFS.rename(nameTarget, TEMP_NAME);
			}
		}
		
		File targetFile;
		if(toSDCard) targetFile = SD_USE.open(nameTarget, "w");
		else targetFile = SPIFFS.open(nameTarget, "w");
		if(!targetFile) {
			sourceFile.close();
			if(renamed) SPIFFS.rename(TEMP_NAME, nameTarget);
			return false;
		}
		
		size_t bytesRead = COPY_BUFFER;
		while(bytesRead = sourceFile.read(bufFile, COPY_BUFFER)) {
			targetFile.write(bufFile, bytesRead);
			#ifdef DEBUG
		//	Serial.write(bufFile, bytesRead);
			#endif
		}
		targetFile.flush();
		sourceFile.close();
		targetFile.close();

		if(renamed) SPIFFS.remove(TEMP_NAME);

		#ifdef DEBUG
		log_e("Time to read %s - %6d ms", nameSource, millis() - fileWriteStamp);
		#endif

		if(removeSource) {
			if(toSDCard) {
				SPIFFS.remove(nameSource);
			} else {
				SD_USE.remove(nameSource);
			}
		}
		delay(20);
		return true;
	} else if(targetExist) return true;
	return false;
}

Debug Message

Before on 3.5.0:

[E][Gauge.S.cpp:1536] handleCopyFile(): Time to read /displayConfig.json -     73 ms

Now on 4.2.0:

[E][Gauge.S.cpp:1536] handleCopyFile(): Time to read /displayConfig.json -   1466 ms


### Other Steps to Reproduce

_No response_

### I have checked existing issues, online documentation and the Troubleshooting Guide

- [X] I confirm I have checked existing issues, online documentation and Troubleshooting guide.

handmade0octopus avatar May 04 '22 15:05 handmade0octopus

Hi, there were issues with SD card in 2.0.2.

Please retest this with 2.0.3 which was just released. SD performance issues are fixed there.

Thanks

In general, there are not any performance issue in other parts you mentioned above (as far as we know) and 2.0.2 is out for quite some time.

VojtechBartoska avatar May 04 '22 15:05 VojtechBartoska

In general, there are not any performance issue in other parts you mentioned above (as far as we know) and 2.0.2 is out for quite some time. Will do once they release on PlatformIO.

You haven't changed how Serial is handled for 2.0? Any other things I can look for for performance bottlenecks?

handmade0octopus avatar May 04 '22 15:05 handmade0octopus

Main difference from Arduino Core 1.0.6 (used in PIO 3.x) to Core 2.0.2 (used in PIO 4.x) is that most Peripherals were refactored to use IDF instead of reading/writing registers directly.

Serial is based on IDF now, but I don't think it is a problem for performance. Everything is IRQ driven and there may be a overhead of maybe 500 CPU cycles on it, if compared to 1.0.6.

SuGlider avatar May 04 '22 17:05 SuGlider

PIO is still pointing to arduino-esp32 v1.0.6 (very old). Jumping to 2.0.2 is a huge leap, and yes there were issues with SD card that were solved in 2.0.3. Serial did change but no known issues. Try 2.0.3.

mrengineer7777 avatar May 04 '22 17:05 mrengineer7777

@handmade0octopus I guess you use Serial to read a GPS, is it right? We provide a new API HardwareSerial::onReceive(callbackFunction) that may help your application to get better performance. It will callback a fuction as soon as data arrives to UART, like an ISR...

SuGlider avatar May 04 '22 17:05 SuGlider

Thanks @SuGlider, any idea about increased heap usage? Do you know what sort of things changed in this regards? Does Arduino 2.0 uses any extra xTaskCreate for its internal use that I may stumble upon?

handmade0octopus avatar May 04 '22 17:05 handmade0octopus

Regarding Heap, we have changed a few configurations for WiFi in sdkconfig file. There was an issue with Heap left with WiFi when Arduino Core 2.0.0 was compared to 1.0.6. Not sure if this is your case... Also not sure if PIO has those changes as well...

Look at https://github.com/espressif/arduino-esp32/issues/5474#issuecomment-946952489 to see a comparation: PR #5791

Arduino v1.0.6 (v3.3.5-1-g85c43024c) ==> Free memory: 213,900 bytes || Binary Size 692,190 bytes Arduino v2.0.0 (v4.4-dev-2313-gc69f0ec32 / sha.44c1198) ==> Free memory: 142,228 bytes || Binary Size 734,576 bytes Arduino v2.0.1 (v4.4-dev-3401-gb86fe0c66c / sha.f459b10) ==>Free memory: 216,140 bytes || Binary Size 645,381 bytes

Comparing V2.0.0 with V2.0.1, this sketch has about 74K extra heap and takes about 90K less flash with the compiled binary. The numbers are now likely better than it was before with 1.0.6.

SuGlider avatar May 04 '22 18:05 SuGlider

Regarding UART onReceive() check this #6134 and #6364

SuGlider avatar May 04 '22 18:05 SuGlider

@SuGlider I really try to port my code but its getting harder every second I look.

SD.remove() API has been changed and it requires new things now, I cant use my wifi portal (this may be due to SD card issues).

The heap problem was with the Wifi off (not initiated at all) with wifi on usage is similair fortunately but it doesn't work.

Is there a full list of functions that has been affected so I can go through and fit them? Even some things doesn't display right (one of my lcd.println(string) is affected - but only single one..).

Sorry for my frustration but it seems I am stuck with very old version as trying to find all bugs that update caused it's not feasible.

I understand you had to make some changes to make it easy to port to new IDF versions but it's so strange all the performance issues are worse (can't use 2.0.3 for now so I can't check).

Also I found the issue I had with large flash size - it was BluetoothSerial.h - once I commented it out flash size was exactly the same.

handmade0octopus avatar May 04 '22 18:05 handmade0octopus

@handmade0octopus Just go ahead with 2.0.3. And yes, all the changes are listed in particular releases.

Edit: What is the reason you cannot used 2.0.3? Or at least 2.0.2 with manually check out SD fixes?

VojtechBartoska avatar May 04 '22 18:05 VojtechBartoska

@VojtechBartoska PlatformioIO doesn't support it yet.

Please don't make out of me chooser beggar, I understand how it sounds. I am more than thankful for all hard work you guys do. I am just trying to use it and it saddens me seeing I have to rewrite code I worked for over 2 years on.

SD card isn't the only problem I have, there are many, many more I listed above. I don't know if changing approach on Serial will cause my program to work better, I think those simple read() commands should work as fast as any other - especially that I guard them with Serial.available() everywhere - this just doesn't make any sense to me sorry.

handmade0octopus avatar May 04 '22 18:05 handmade0octopus

I understand you had to make some changes to make it easy to port to new IDF versions but it's so strange all the performance issues are worse (can't use 2.0.3 for now so I can't check).

I'm not expert on PIO, but it seems that there may be a way to setup 2.0.3 in PIO. It's necessary to change some PIO setup files (I have not tried it...).

Check these links:

https://github.com/espressif/arduino-esp32/issues/6647#issuecomment-1111792625 https://github.com/espressif/arduino-esp32/issues/6632#issuecomment-1111016155

SuGlider avatar May 04 '22 19:05 SuGlider

@handmade0octopus Welcome to changing major rev numbers (1.X -> 2.X). Before you get too far, I recommend starting a completely separate project. That way you can start with a working base and slowly bring over functionality. Oh and use a VCS system to checkpoint your project. Github Desktop has been handy for me.

As a developer I've had to do that several times with various projects. Expect 1 to 2 weeks of work porting it. Or you can stick with 1.0.6. Your choice.

mrengineer7777 avatar May 04 '22 19:05 mrengineer7777

@handmade0octopus You can use the release 2.0.3 with Platformio. Use this setup

platform = espressif32
platform_package = framework-arduinoespressif32 @ https://github.com/espressif/arduino-esp32.git#142fceb8563cd1795d619829e0a103770a344e1a

Jason2866 avatar May 04 '22 19:05 Jason2866

You can use the release 2.0.3 with Platformio.

This doesn't work unfortunately. Still loads 2.0.2.

Expect 1 to 2 weeks of work porting it. Or you can stick with 1.0.6. Your choice.

Of course I am prepared to do that, but first I need to understand why my current Serial.available() in a loop and normal writing and reading is not performing as it should. Weather it is something to do with how it is structured or some internal timings.

I am just afraid that I will spend those 2 weeks porting it and issue is still persistent.

handmade0octopus avatar May 04 '22 22:05 handmade0octopus

@handmade0octopus

Can you give us more information about what you said that available/read/write are not performing as expected?

SuGlider avatar May 05 '22 00:05 SuGlider

@handmade0octopus You can use the release 2.0.3 with Platformio. Use this setup

platform = espressif32
platform_package = framework-arduinoespressif32 @ https://github.com/espressif/arduino-esp32.git#142fceb8563cd1795d619829e0a103770a344e1a

You are missing an 's' in platform_packages

The correct way would be:

platform = espressif32
platform_packages = framework-arduinoespressif32 @ https://github.com/espressif/arduino-esp32.git#142fceb8563cd1795d619829e0a103770a344e1a

gonzabrusco avatar May 05 '22 02:05 gonzabrusco

@gonzabrusco thanks it worked.

But found issue:

error: call of overloaded 'String(float, uint8_t)' is ambiguous

Had to change:

explicit String(float, unsigned int decimalPlaces = 2);
explicit String(double, unsigned int decimalPlaces = 2);

To:

explicit String(float, unsigned char decimalPlaces = 2);
explicit String(double, unsigned char decimalPlaces = 2);

In WString.h and .cpp or cast second parameter as (uint32_t)..

Potential wrapper using old method just calling new ones? Something to consider!

I like speed of SD card now, it's faster than before, WiFi connects faster.

Still VERY HIGH usage of heap and serial is performing badly.

I disabled GPS and it's not issue with it, it has to do something with how I handle LIN connection.

I would also reconsider how SD "remove" is handled as "SD.remove(file.name())" doesn't work anymore but leaves people without clue, I would add "change .name() to .path()" in debug messages as people can go around in circles by trying to solve it.

Regarding serial I believe it has something to do with .available() method (or how it blocks thread), Ill run some tests and let you know.

handmade0octopus avatar May 05 '22 12:05 handmade0octopus

In project Tasmota we migrated since 2.0 alpha was out. Migrating the code from 1.0.x core to 2.0.x gave us better performance and MORE free heap and less flash usage. Migrating was sometimes a pain, using a alpha version... Imho you have librarys or/and in your code stuff which needs to be adopted to work (nice) with 2.0.3. There is NO smooth change from 1.0.x to 2.0.x. It is to expect by a Major release chnage.

Jason2866 avatar May 05 '22 13:05 Jason2866

Ok I investigated UART thing a little.

I will try to explain it to you in minimal simplified code - basically I run a task inside a loop:

// serial == Serial2.begin(9600, SERIAL_8E2);
xTaskCreatePinnedToCore(
		serialTask,
		"serialTask",
		8192,
		(void*) 0,
		8, // I set different priority - no change
		NULL,
		0); // I set different cores - no change

uint8_t hexAddress[255];
uint8_t response[255];

void serialTask(void* param) {
	while(1){
		ds2.sendCommand(hexAddress);
		uint32_t readResponse = ds2.receiveData(response);
		if(readResponse) {
			// do stuff
		}
		delay(1);
	}
}

uint8_t DS2::sendCommand(uint8_t command[], uint8_t respLen) {
	if(messageSend) {
		return 0;
	} else {
		if(respLen != 0) responseLength = respLen;
		timeStamp = millis();
		messageSend = true;
		respLen= serial.write(data, respLen);
		serial.flush(); // Commenting flush off actually improves performance!
		return respLen;
	}
}

uint8_t DS2::receiveData(uint8_t data[]) {
	log_e("Time: %d", millis() - timeStamp); // Here I log what I got below <===
	if(messageSend) {
		if(readData(data)) {
		        messageSend = false;
			return true;
		}
	}
	return false;
}

boolean DS2::readData(uint8_t data[]) {
	uint32_t startTime = millis();
	uint8_t availible = serial.available();
	if(availible > 2) {
		// do stuff
		return true;
	}
	return false;
}

I made this simple test setup.

Basically on version 1.0.5 I test I have very consistent:

[E][DS2.cpp:55] receiveData(): Time: 7
[E][DS2.cpp:55] receiveData(): Time: 7
[E][DS2.cpp:55] receiveData(): Time: 7
[E][DS2.cpp:55] receiveData(): Time: 7
[E][DS2.cpp:55] receiveData(): Time: 7
[E][DS2.cpp:55] receiveData(): Time: 7
[E][DS2.cpp:55] receiveData(): Time: 7
[E][DS2.cpp:55] receiveData(): Time: 7
[E][DS2.cpp:55] receiveData(): Time: 7

And on 2.0.3 I have:

[877344][E][DS2.cpp:55] receiveData(): Time: 7
[877345][E][DS2.cpp:55] receiveData(): Time: 8
[877346][E][DS2.cpp:55] receiveData(): Time: 9
[877347][E][DS2.cpp:55] receiveData(): Time: 10
[877350][E][DS2.cpp:55] receiveData(): Time: 13
[877355][E][DS2.cpp:55] receiveData(): Time: 18

[877464][E][DS2.cpp:55] receiveData(): Time: 7
[877465][E][DS2.cpp:55] receiveData(): Time: 8
[877466][E][DS2.cpp:55] receiveData(): Time: 9
[877467][E][DS2.cpp:55] receiveData(): Time: 10
[877470][E][DS2.cpp:55] receiveData(): Time: 13
[877475][E][DS2.cpp:55] receiveData(): Time: 18

And if I remove flush() after write it gets even funnier!

On 1.0.5 consistent:

[E][DS2.cpp:55] receiveData(): Time: 1
[E][DS2.cpp:55] receiveData(): Time: 2
[E][DS2.cpp:55] receiveData(): Time: 3
[E][DS2.cpp:55] receiveData(): Time: 4

[E][DS2.cpp:55] receiveData(): Time: 1
[E][DS2.cpp:55] receiveData(): Time: 2
[E][DS2.cpp:55] receiveData(): Time: 3
[E][DS2.cpp:55] receiveData(): Time: 4

[E][DS2.cpp:55] receiveData(): Time: 1
[E][DS2.cpp:55] receiveData(): Time: 2
[E][DS2.cpp:55] receiveData(): Time: 3
[E][DS2.cpp:55] receiveData(): Time: 4

On 2.0.3 though:

[ 11468][E][DS2.cpp:55] receiveData(): Time: 1
[ 11469][E][DS2.cpp:55] receiveData(): Time: 2
[ 11470][E][DS2.cpp:55] receiveData(): Time: 3
[ 11471][E][DS2.cpp:55] receiveData(): Time: 4
[ 11474][E][DS2.cpp:55] receiveData(): Time: 7
[ 11478][E][DS2.cpp:55] receiveData(): Time: 11
[ 11483][E][DS2.cpp:55] receiveData(): Time: 16

[ 11589][E][DS2.cpp:55] receiveData(): Time: 1
[ 11590][E][DS2.cpp:55] receiveData(): Time: 2
[ 11591][E][DS2.cpp:55] receiveData(): Time: 3
[ 11592][E][DS2.cpp:55] receiveData(): Time: 4
[ 11595][E][DS2.cpp:55] receiveData(): Time: 7
[ 11599][E][DS2.cpp:55] receiveData(): Time: 11
[ 11604][E][DS2.cpp:55] receiveData(): Time: 16

Hope that helps you to try to tell me to what test next..

To be honest I am lost completely. Looks like new version sends (or receives) data much slower and can't cope with changing serial that fast.

handmade0octopus avatar May 05 '22 15:05 handmade0octopus

serial.flush(); // Commenting flush off actually improves performance!

I think you already know it, but flush() waits (blocking) until the UART TX FIFO is fully empty.

About the performance, I'll test it soon. I can already tell you that 2.0.0+ version has a HardwareSerial API that uses extra buffers and queues within IDF UART Driver.

Arduino Corte Versions 1.0.6 or lower goes directly to the SoC Registers and ISRs. Thus, it may have a faster response.

But we have a special API for faster reading/response UART: HardwareSerial::onReceive(callbackFunc)

It will require a change on the logic of your application, but I think it will get the same performance as before.

SuGlider avatar May 05 '22 21:05 SuGlider

Can you roughly explain how is onReceive triggered?

Is it triggered when available is bigger than 1? Does it happen on separate thread or its blocking?

I will try to write code but it would be useful how should I handle it for my application. Will test and let you know.

handmade0octopus avatar May 06 '22 01:05 handmade0octopus

Can you roughly explain how is onReceive triggered?

When data arrives to UART, an ISR adds it to the internal RX buffer. If there are no UART incoming data for more than a timeout equivalent, by default, to 10 UART symbols equivalent in time (depends on the baudrate) it will trigger onReceive.

The timeout, in Symbols, can be set by HardwareSerial::setRxTimeout()

Alternatively, if UART receives a stream of data (no pause, no timeout), it will also trigger it for every 120 bytes.

Is it triggered when available is bigger than 1?

No. It is not triggered based on available(). But you can read available() in the callback function. All Serial interfaces can be used inside the callback function.

Does it happen on separate thread or its blocking?

non-blocking - asynchronous to Arduino main task execution. It's a callback running from a separated task, used only to treat UART events, among them, the data received in the conditions described above.

List of PRs with information and examples: #6134 adds onReceive() #6364 adds std::function support to onReceive() #6397 About setRxTimeout()

I will try to write code but it would be useful how should I handle it for my application. Will test and let you know.

OK. Thanks.

SuGlider avatar May 06 '22 03:05 SuGlider

@SuGlider rewriting code soon. Last question, can I use delays and/or semaphores in onReceive()?

handmade0octopus avatar May 08 '22 02:05 handmade0octopus

@SuGlider rewriting code soon. Last question, can I use delays and/or semaphores in onReceive()?

Yes, it's not a real ISR, just a callback function from an event created by the UART ISR. Thus, you can run whatever you want on it.

SuGlider avatar May 08 '22 04:05 SuGlider

@handmade0octopus - When coding using onReceive() just have in mind that it runs on separated Highest priority Task, thus, if it doesn't returns or goes into Blocked, and Suspended Task state, Arduino main sketch Task won't run also...

delay() makes the running Task to go to Blocked state. The same for a Semaphore waiting FreeRTOS function.

Thus those two will not be a problem. But an infitite no ending loop will be a problem, because it will run for ever and Arduino Setup/Loop task won't run at all (lower priority).

SuGlider avatar May 09 '22 11:05 SuGlider

@SuGlider sorry not had chance to play with it yet.

But an infitite no ending loop will be a problem, because it will run for ever and Arduino Setup/Loop task won't run at all (lower priority).

Well, it won't work as it will definitely trigger WDT.

By the way I understand this onReceive() triggered task is using core 1 - is there a way to change it if I need to or do I need to modify source code?

handmade0octopus avatar May 10 '22 01:05 handmade0octopus

By the way I understand this onReceive() triggered task is using core 1 - is there a way to change it if I need to or do I need to modify source code?

Check this PR https://github.com/espressif/arduino-esp32/pull/6685 It will allow the sketch to change the Running Core, Task Priority and Task Stack Size for the onReceive() event task. This PR shall be available for Arduino Core v2.0.4

SuGlider avatar May 10 '22 18:05 SuGlider

Adding this to 2.0.4 Milestone, when it's released this issue can be retested and probably closed.

VojtechBartoska avatar May 30 '22 13:05 VojtechBartoska

@SuGlider @VojtechBartoska good news. Finally been able to test it (been on holiday).

Long story short - onReceive() gave 0 performance increase.

But I found that using Serial2.setRxTimeout(1); brought back old performance back.

So you can still use polling with this setup very well.

My advice would be to set RX timeout to 1 by default so users can port their code without need of debugging what is going wrong?

Heap usage is still big, 50kb is lost somewhere.

Also, is there any #define where I can check for arduino version?

handmade0octopus avatar May 31 '22 23:05 handmade0octopus