esp-idf icon indicating copy to clipboard operation
esp-idf copied to clipboard

ESP32S3 eMMC driver won't do 52M (IDFGH-11863)

Open marchingband opened this issue 1 year ago • 8 comments

Answers checklist.

  • [X] I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
  • [X] I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
  • [X] I have searched the issue tracker for a similar issue and not found a similar issue.

General issue report

I initialize the driver like this:

  sdmmc_host_t host = SDMMC_HOST_DEFAULT();
  sdmmc_slot_config_t slot_config = SDMMC_SLOT_CONFIG_DEFAULT();
  slot_config.width = 4;
  slot_config.clk = EMMC_CLK;
  slot_config.cmd = EMMC_CMD;
  slot_config.d0 = EMMC_DATA0;
  slot_config.d1 = EMMC_DATA1;
  slot_config.d2 = EMMC_DATA2;
  slot_config.d3 = EMMC_DATA3;

  host.max_freq_khz = SDMMC_FREQ_52M;
  host.flags &= ~SDMMC_HOST_FLAG_DDR;

  ESP_ERROR_CHECK(sdmmc_host_init());

  ret = sdmmc_host_init_slot(SDMMC_HOST_SLOT_1, &slot_config);
  if(ret != ESP_OK){
    ESP_LOGI(TAG, "sdmmc_host_init_slot : %s", esp_err_to_name(ret));
  }
  
  ret = sdmmc_card_init(&host, &card);
  if(ret != ESP_OK){
    ESP_LOGI(TAG, "sdmmc_card_init : %s", esp_err_to_name(ret));
  }
  
  sdmmc_card_print_info(stdout, &card);

And it produces this log:

Name: 4FTE4R␁
Type: MMC
Speed: 40.00 MHz (limit: 52.00 MHz)
Size: 3728MB
CSD: ver=3, sector_size=512, capacity=7634944 read_bl_len=9
EXT CSD: bus_width=4

Am I missing something? If 52M is not possible today, is it likely to be possible in a future release? Is there any other configuration that would produce a bit more read speed? thanks!

marchingband avatar Jan 09 '24 21:01 marchingband

Looks like you are enabling 4-line mode. In this case, you can try keeping DDR enabled, this should result in higher read speeds.

Supporting exactly 52 MHz is not possible due to the limitations of the clock divider, the closest possible frequency is 53.3 MHz. However we haven't tested this, and I'm not sure if we can get adequate phase adjustments at this frequency.

Is there some specific throughput value you need to reach?

igrr avatar Jan 15 '24 14:01 igrr

thanks @igrr DDR mode fails for me. I am porting code from ESP32 to ESP32-S3 for this device https://www.sparkfun.com/products/21307 It runs at 52M on ESP32, and depends on that throughput for its functionality. At 40M on the S3, it is at about 80% performance , compared to the ESP32. The device plays back up to 18 stereo audio files from eMMC. On S3 at 40M I only get 15. 53.3M would be great. Is there a way that I can test that?

marchingband avatar Jan 15 '24 18:01 marchingband

@igrr hi, I add DDR flag as follow:

    host.flags = SDMMC_HOST_FLAG_4BIT;
    host.flags |= SDMMC_HOST_FLAG_DDR;

but fail in ESP32-DevKitC, and module is ESP32-WROOM-32E. Here the log :

 (726) sdmmc_cmd: sending cmd slot=1 op=8 arg=0 flags=1c50 data=0x3ffb5440 blklen=512 datalen=512 timeout=1000
 (746) sdmmc_cmd: cmd response 00000900 00000000 00000000 00000000 err=0x109 state=4
 (746) sdmmc_mmc: sdmmc_init_mmc_check_ext_csd: send_ext_csd_data error 0x109
 (756) vfs_fat_sdmmc: sdmmc_card_init failed (0x109).
 (766) eMMC: Failed to initialize the eMMC (ESP_ERR_INVALID_CRC). Make sure eMMC lines have pull-up resistors in place.

The emmc chip is KLM8G1GETF-B041 image without DDR flag can work well and here the printed card info Name: 8GTF4R Type: MMC Speed: 40.00 MHz (limit: 52.00 MHz) Size: 7456MB CSD: ver=3, sector_size=512, capacity=15269888 read_bl_len=9 EXT CSD: bus_width=4 Any idea about this problem?

hibfh avatar Jan 30 '24 08:01 hibfh

ESP_ERR_INVALID_CRC is a CRC error. This is typically related to the hardware design. Probably there is some signal integrity issue with your eMMC connection. If you have designed a PCB and it doesn't work, you can send the design to Espressif for the hardware design review, via the website. General troubleshooting suggestions include matching the length of the lines to the card, making the lines short, ensuring you have sufficiently strong pull-up resistors.

igrr avatar Jan 30 '24 08:01 igrr

@igrr I am still unable to get 40 MHz DDR, 4 lines mode. I have a custom PCB and lots of experience with eMMC. I am using a Samsung EMMC, p# KLM4G1FETE-B041. There is some confusion in #8257, where both you and another user get this mode to work, but it is not clear what part you are using. Do you remember what part did work? Thanks.

marchingband avatar Apr 05 '24 19:04 marchingband

@igrr any tips? I'd love to move this forward. Can you recall the part you used? Thank you.

marchingband avatar May 05 '24 01:05 marchingband

Hi @marchingband, sorry for missing your previous message. The chip we used was IS21ES08G-JCLI.

One additional thing you can try is adjusting sampling delay on the ESP32-S3 side. There is a parameter you can change in sdmmc_host_t structure: https://github.com/espressif/esp-idf/blob/d4cd437ede613fffacc06ac6d6c93a083829022f/components/esp_driver_sdmmc/include/driver/sdmmc_default_configs.h#L45

igrr avatar May 05 '24 06:05 igrr

@igrr I didn't notice that the input_delay_phase had been added to the driver. thank you! with

    host.flags = SDMMC_HOST_FLAG_4BIT | SDMMC_HOST_FLAG_DDR;
    host.input_delay_phase = SDMMC_DELAY_PHASE_1;            /*!< Delay phase 1 */

I have success using klm4G1FETE. Amazing! Thank you so much! With that change I get 16 simultaneous wav files played back, compared to 18 on esp32 at 52M, and compared to 15 on S3 without the DDR, so that is progress :)

Now I have set the host.max_freq_khz = 53300; But the driver still runs at 40Mhz. Is there a way I can test the higher speed you mentioned earlier?

marchingband avatar May 06 '24 22:05 marchingband

@marchingband I tried to make some simple changes to get the 52 MHz (53.3 actually) mode to work.

The good news is, without DDR enabled it works after a very simple change in the driver.

Performance at 40M, no DDR:
  sector  | count | align | alloc  | size(kB)  | wr_time(ms) | wr_speed(MB/s)  |  rd_time(ms)  | rd_speed(MB/s)
        0 |    1  |   4   |  sram  |     0.5   |    55.68    |       0.01      |      0.56     |      0.87
        0 |    4  |   4   |  sram  |     2.0   |     1.00    |       1.95      |      0.63     |      3.12
        0 |    8  |   4   |  sram  |     4.0   |     0.99    |       3.95      |      0.69     |      5.64
        0 |   16  |   4   |  sram  |     8.0   |     1.28    |       6.12      |      1.04     |      7.54
        0 |   32  |   4   |  sram  |    16.0   |     2.08    |       7.52      |      1.54     |     10.17
        0 |   64  |   4   |  sram  |    32.0   |     2.75    |      11.38      |      2.42     |     12.93
        0 |  128  |   4   |  sram  |    64.0   |     4.48    |      13.96      |      4.14     |     15.11
        0 |    1  |   1   |  sram  |     0.5   |     0.88    |       0.56      |      0.57     |      0.86
        0 |    8  |   1   |  sram  |     4.0   |     5.46    |       0.72      |      2.29     |      1.71
        0 |  128  |   1   |  sram  |    64.0   |    83.46    |       0.75      |     32.14     |      1.94
Performance at 52M, no DDR:
  sector  | count | align | alloc  | size(kB)  | wr_time(ms) | wr_speed(MB/s)  |  rd_time(ms)  | rd_speed(MB/s)
        0 |    1  |   4   |  sram  |     0.5   |     0.96    |       0.51      |      0.55     |      0.88
        0 |    4  |   4   |  sram  |     2.0   |     0.92    |       2.13      |      0.60     |      3.23
        0 |    8  |   4   |  sram  |     4.0   |     0.99    |       3.94      |      0.66     |      5.96
        0 |   16  |   4   |  sram  |     8.0   |     1.21    |       6.46      |      0.93     |      8.36
        0 |   32  |   4   |  sram  |    16.0   |     2.48    |       6.31      |      1.27     |     12.29
        0 |   64  |   4   |  sram  |    32.0   |     2.74    |      11.42      |      1.91     |     16.33
        0 |  128  |   4   |  sram  |    64.0   |     4.31    |      14.50      |      3.21     |     19.49
        0 |    1  |   1   |  sram  |     0.5   |     0.90    |       0.54      |      0.56     |      0.87
        0 |    8  |   1   |  sram  |     4.0   |     5.76    |       0.68      |      2.23     |      1.75
        0 |  128  |   1   |  sram  |    64.0   |    90.63    |       0.69      |     31.05     |      2.01

(all numbers are for 4-line mode)

So the max. read performance goes up from 15.11 MB/sec to 19.49 MB/sec — 29% higher, roughly consistent with 30% frequency increase.

However in DDR mode, data integrity check fails at 53.3 MHz. I think this is due to the fact that the waveform has 33.3% (or 66.6%) duty cycle, as we are getting this frequency by dividing 160 MHz clock by 3. In DDR mode, eMMC requires clock duty cycle to be between 45% and 55%. I think it might not be possible to get 52 MHz DDR mode to work on ESP32-S3 due to the clock divider limitation.

igrr avatar May 07 '24 14:05 igrr

I also tried another thing, which I totally didn't expect to work.

I set the clock divider to produce 80 MHz clock (so, divide the 160 MHz source clock by 2) and it actually worked, without DDR mode, producing the following performance numbers:

Performance at 80M, no DDR:
  sector  | count | align | alloc  | size(kB)  | wr_time(ms) | wr_speed(MB/s)  |  rd_time(ms)  | rd_speed(MB/s)
        0 |    1  |   4   |  sram  |     0.5   |     0.95    |       0.51      |      0.55     |      0.88
        0 |    4  |   4   |  sram  |     2.0   |     1.72    |       1.14      |      0.56     |      3.47
        0 |    8  |   4   |  sram  |     4.0   |     0.85    |       4.57      |      0.62     |      6.30
        0 |   16  |   4   |  sram  |     8.0   |     1.08    |       7.21      |      0.85     |      9.23
        0 |   32  |   4   |  sram  |    16.0   |     1.47    |      10.67      |      1.05     |     14.91
        0 |   64  |   4   |  sram  |    32.0   |     2.47    |      12.67      |      1.54     |     20.35
        0 |  128  |   4   |  sram  |    64.0   |     3.50    |      17.85      |      2.42     |     25.84
        0 |    1  |   1   |  sram  |     0.5   |     0.89    |       0.55      |      0.55     |      0.88
        0 |    8  |   1   |  sram  |     4.0   |     5.50    |       0.71      |      2.17     |      1.80
        0 |  128  |   1   |  sram  |    64.0   |    82.83    |       0.75      |     30.28     |      2.06

which is roughly the same as the performance at 40 MHz with DDR:

Performance at 40M, with DDR:
  sector  | count | align | alloc  | size(kB)  | wr_time(ms) | wr_speed(MB/s)  |  rd_time(ms)  | rd_speed(MB/s)
        0 |    1  |   4   |  sram  |     0.5   |     0.92    |       0.53      |      0.55     |      0.89
        0 |    4  |   4   |  sram  |     2.0   |     0.89    |       2.20      |      0.58     |      3.40
        0 |    8  |   4   |  sram  |     4.0   |     0.94    |       4.16      |      0.63     |      6.23
        0 |   16  |   4   |  sram  |     8.0   |     1.13    |       6.93      |      0.85     |      9.18
        0 |   32  |   4   |  sram  |    16.0   |     1.69    |       9.23      |      1.13     |     13.86
        0 |   64  |   4   |  sram  |    32.0   |     1.96    |      15.93      |      1.60     |     19.54
        0 |  128  |   4   |  sram  |    64.0   |     2.90    |      21.58      |      2.49     |     25.12
        0 |    1  |   1   |  sram  |     0.5   |     0.94    |       0.52      |      0.55     |      0.89
        0 |    8  |   1   |  sram  |     4.0   |     5.61    |       0.70      |      2.17     |      1.80
        0 |  128  |   1   |  sram  |    64.0   |    88.65    |       0.70      |     30.32     |      2.06

80 MHz + DDR still doesn't work, but the timing for that mode is even more difficult to meet.

igrr avatar May 07 '24 14:05 igrr

With that change I get 16 simultaneous wav files played back, compared to 18 on esp32 at 52M

I would also be interested in a more quantitative benchmark from your application. What is the total throughput (in megabytes per second) which you can achieve? It is possible that the bottleneck is not only in the low-level SDMMC driver.

It's also interesting that you get higher performance on the ESP32 compared to ESP32-S3.

ESP32 sdmmc driver is exactly the same as ESP32-S3 (apart from GPIO-related configuration logic), and 52 MHz frequency set in max_freq_khz also results in only 40 MHz bus clock. Perhaps there is some other reason for lower performance you are observing on the S3. If you manage to convert your "number of wav files" metric into megabytes per second, it will be easier to understand this discrepancy.

igrr avatar May 07 '24 14:05 igrr

@igrr

Thank you for the research!

My old application was in Arduino, under arduino-esp version 2.0.1 when I print the card info I get

Name: 8GTF4R␆
Type: MMC
Speed: 52 MHz
Size: 7456MB
CSD: ver=3, sector_size=512, capacity=15269888 read_bl_len=9

I suppose this is not the real speed. Indeed when I use SDMMC_FREQ_HIGHSPEED there is no change in performance.

Is your tool to benchmark the eMMC something I could access? I do not have a particularly scientific way to do this currently. The wav files are 44.1k, 16bit, stereo, so for 18 files that would be 3.175 MB/sec. I read in 6 block chunks, so that appears to be in line with your benchmarks.

Looking elsewhere for the bottleneck, I have not yet switched to the newer i2s driver. Is there any documentation online that discusses the difference with the new driver? I rely on i2s_write being a blocking function.

marchingband avatar May 07 '24 16:05 marchingband

I suppose this is not the real speed.

That's right. Until https://github.com/espressif/esp-idf/commit/56f20013174a352edebf5c9d983002237734c1a2 (added in v5.1 release) we did not print the real bus frequency, on the max_freq_khz setting. I guess you are using a newer version of IDF now, so the real bus frequency is printed in addition to the maximum frequency supported by the card.

Is your tool to benchmark the eMMC something I could access?

This is a test app inside IDF. In the master branch it is in https://github.com/espressif/esp-idf/tree/master/tools/test_apps/storage/sdmmc_console. In release/v5.1 these tests were accessible via the unit-test-app, compiling it with tests for sdmmc component.

I read in 6 block chunks, so that appears to be in line with your benchmarks.

In that case, increasing the size of the buffer might be the easiest way to increase throughput? Do you have enough internal RAM to increase the chunk size to, say, 64 kB?

Looking elsewhere for the bottleneck, I have not yet switched to the newer i2s driver. Is there any documentation online that discusses the difference with the new driver?

There is the migration guide: https://docs.espressif.com/projects/esp-idf/en/latest/esp32s3/migration-guides/release-5.x/5.0/peripherals.html#i2s-driver. If you run into some problem with I2S, please open a new issue for that. I am planning to close this one when 52 MHz (no DDR) mode support gets merged, soon.

igrr avatar May 07 '24 16:05 igrr

@igrr thanks again for your work here, I deeply appreciate your attention and expertise.

You were right, the remaining bottleneck was not the eMMC read, it is actually the speed of the processor to execute the mixing algorithm, which is highly optimized, using fixed-point math. It is a matter of just a few instructions, where, if I simplify away a few lines of the algorithm, then I get good performance, so it is very very close. I wonder if IRAM_ATTR is respected the same on S3, or if there are any other similar changes I may not know of, or if there are any great guides you can recommend on performance tuning the S3.

marchingband avatar May 08 '24 21:05 marchingband

here is the algorithm in case you are curious. https://github.com/marchingband/wvr/blob/main/src/wav_player.c

marchingband avatar May 08 '24 21:05 marchingband