idf-extra-components icon indicating copy to clipboard operation
idf-extra-components copied to clipboard

MT29 filesystem mount problems (IEC-169)

Open mttcarbone opened this issue 1 year ago • 22 comments

Answers checklist.

  • [X] I have read the documentation of the component in question and the issue is not addressed there.
  • [X] I have searched the issue tracker for a similar issue and not found a similar issue.

General issue report

I am using this library on a project with MT29F2G01ABAGD, I am experiencing problems starting esp32-s3: returns this warning vfs_fat_nand: f_mount failed (13) the flash is formatted, and proceeds to write and read the hello.txt file

What can I do to avoid formatting the flash every boot?

Doing other tests, if I unmounted and mount the flash the hello.txt file is read incorrectly, and it truncates "HELLO TXT" as its contents, what can I do to avoid this?

Screenshot_20240926_180456

mttcarbone avatar Sep 26 '24 16:09 mttcarbone

Could you please try to verify that basic read and write operations to flash work correctly? FatFS error 13 usually indicates that the filesystem is corrupted. The most common case is that the underlying Flash driver isn't transferring some data correctly in one of the directions. You can try calling spi_nand_flash_write_sector and spi_nand_flash_read_sector to write some test data and then verify that the read back result is the same.

By the way, since this flash chip is not on the list of supported ones, it might make sense to check its datasheet and see if there is any difference related to timing, compared to other supported chips.

igrr avatar Sep 26 '24 16:09 igrr

@igrr What Micron Chips do you support? (There are none listed here), but provision has been made for Micron chips in this header file. There's also an implentation here that caters for multiple vendors, including Micron.

Teesmo avatar Sep 26 '24 17:09 Teesmo

AFAIK we haven't tested this component on any of the Micron chips. Micron support was contributed by @UnTraDe in https://github.com/espressif/idf-extra-components/pull/327. I guess @UnTraDe used it successfully, maybe there is something slightly different in your hardware setup. As I mentioned above, I would recommend doing a basic write/readback sanity check to make sure that the data is being written correctly.

By the way, do you folks work on the same project or it just so happened that you both need Micron NAND flash support? If it's the latter, we can try to get some samples of that chip and test them...

igrr avatar Sep 27 '24 14:09 igrr

@igrr Thank you for the support! We are different team, I chose micron because of cost issue, comparing with the nand flash on the market it has good prices. It would be great to have your support, extend support to micron like the MT29F2G01ABAGDWB in my possession and share it with the community and help other teams!

mttcarbone avatar Sep 27 '24 15:09 mttcarbone

Okay, we'll order some boards with MT29 chips, might take some time (worst case, 2 weeks) to get them.

In the meantime perhaps you could do the test I have suggested above and share the result you get.

igrr avatar Sep 27 '24 15:09 igrr

Great! Thank you! I'm following your directions, hope to share some valuable considerations and update on my evidence as early as the next few days.

mttcarbone avatar Sep 27 '24 16:09 mttcarbone

Hi, we are currently using this driver with MT29F4G01ABAFDWB successfully, it should work on the rest of the MT29F series. As far as I understand the differences between the chips in this series is the size and organization of the memory itself (page size, block size, etc), so the only thing needed to add support for them is the ID of the model and the different sizes. Take a look at the changes here: https://github.com/espressif/idf-extra-components/pull/327/files

UnTraDe avatar Sep 27 '24 18:09 UnTraDe

I report here the part of code that I added to the library:

  • flash device id,
  • extended the switch cases log2_page_size and num_blocks, as seen in the code below.

I have several doubts:

  • About the 2 plane structure of the flash and the "Block RA6 controls the plane selection" handling reported in the datasheet of the board;
  • about the delay_us delays.
static esp_err_t spi_nand_micron_init(spi_nand_flash_device_t *dev)
{
    uint8_t device_id;
    spi_nand_transaction_t t = {
        .command = CMD_READ_ID,
        .dummy_bits = 16,
        .miso_len = 1,
        .miso_data = &device_id
    };
    spi_nand_execute_transaction(dev->config.device_handle, &t);
    dev->read_page_delay_us = 115;
    dev->erase_block_delay_us = 2000;
    dev->program_page_delay_us = 240;
    switch (device_id) {
    case MICRON_DI_34:
        dev->dhara_nand.num_blocks = 2048;
        dev->dhara_nand.log2_ppb = 6;        // 64 pages per block
        dev->dhara_nand.log2_page_size = 12; // 4096 bytes per page
        break;
    case MICRON_DI_24:
        dev->dhara_nand.num_blocks = 1024;
        dev->dhara_nand.log2_ppb = 6;        
        dev->dhara_nand.log2_page_size = 11; 
        break;
    default:
        return ESP_ERR_INVALID_RESPONSE;
    }
    return ESP_OK;
}

Looking at the flash structure, is there anything wrong in the configuration? @UnTraDe Thanks for your input and for helping me!

Screenshot_20240927_211225 Screenshot_20240927_211220

mttcarbone avatar Sep 27 '24 19:09 mttcarbone

Hi @igrr , I did the tests you mentioned, I used the spi_nand_flash_read_sector function to see if it wrote correctly to the sectors. The thing does not return any errors.

By increasing .allocation_unit_size = 32 * 1024 from 16 to 32 when I go to re-read the file it comes back correct. There remains the problem that at each reboot of the esp32-s3 the memory is formatted because the file system mount goes errorre.

Analyzing SPI, I saw that QSPI WP and HD pins are not working, is this normal? My spi configuration is this:

#define HOST_ID SPI3_HOST
#define PIN_MOSI (16) 
#define PIN_MISO (14) 
#define PIN_CLK (15) 
#define PIN_CS (13) 
#define PIN_WP (40) 
#define PIN_HD (41) 
#define SPI_DMA_CHAN SPI_DMA_CH_AUTO 

I chose SPI3_HOST, because in the future SPI2_HOST will be used for a display

Screenshot_20241001_160842

mttcarbone avatar Oct 01 '24 14:10 mttcarbone

I did the tests you mentioned, I used the spi_nand_flash_read_sector function to see if it wrote correctly to the sectors. The thing does not return any errors.

Could you please clarify how you tested this? spi_nand_flash_read_sector doesn't write sectors, you need to call spi_nand_flash_write_sector in order to do that. To check if the data has been written correctly, compare the data you read back to the one you originally wrote.

You can check this code from the test app: https://github.com/espressif/idf-extra-components/blob/0603d10e0ee06cdd63ad78d3960238891c49db70/spi_nand_flash/test_app/main/test_spi_nand_flash.c#L129-L153

Analyzing SPI, I saw that QSPI WP and HD pins are not working, is this normal?

Yes, that's normal, currently the library doesn't make use of DIO or QIO related commands, there is a discussion about that in another issue: https://github.com/espressif/idf-extra-components/issues/375#issuecomment-2379644911.

igrr avatar Oct 01 '24 15:10 igrr

Hi @igrr , I ran the following code in the main:

    uint32_t sector_num, sector_size;
    spi_nand_flash_device_t *nand_flash_device_handle;
    spi_device_handle_t spi;
    setup_nand_flash(&nand_flash_device_handle, &spi);

    TEST_ESP_OK(spi_nand_flash_get_capacity(nand_flash_device_handle, &sector_num));
    TEST_ESP_OK(spi_nand_flash_get_sector_size(nand_flash_device_handle, &sector_size));
    printf("Number of sectors: %" PRIu32 ", Sector size: %" PRIu32 "\n", sector_num, sector_size);

    do_single_write_test(nand_flash_device_handle, 1, 16);
    do_single_write_test(nand_flash_device_handle, 16, 32);
    do_single_write_test(nand_flash_device_handle, 32, 64);
    do_single_write_test(nand_flash_device_handle, 64, 128);
    do_single_write_test(nand_flash_device_handle, sector_num / 2, 32);
    do_single_write_test(nand_flash_device_handle, sector_num / 2, 256);
    do_single_write_test(nand_flash_device_handle, sector_num - 20, 16);

    deinit_nand_flash(nand_flash_device_handle, spi);

With the following flash configuration:

static esp_err_t spi_nand_micron_init(spi_nand_flash_device_t *dev)
{
    uint8_t device_id;
    spi_nand_transaction_t t = {
        .command = CMD_READ_ID,
        .dummy_bits = 16,
        .miso_len = 1,
        .miso_data = &device_id
    };
    spi_nand_execute_transaction(dev->config.device_handle, &t);
    switch (device_id) {
    case MICRON_DI_34:
        dev->read_page_delay_us = 115;
        dev->erase_block_delay_us = 2000;
        dev->program_page_delay_us = 240;
        dev->dhara_nand.num_blocks = 2048;
        dev->dhara_nand.log2_ppb = 6;        // 64 pages per block
        dev->dhara_nand.log2_page_size = 12; // 4096 bytes per page
        break;
    case MICRON_DI_24:
        dev->read_page_delay_us = 55;
        dev->erase_block_delay_us = 2000;
        dev->program_page_delay_us = 220;
        dev->dhara_nand.num_blocks = 2048;
        dev->dhara_nand.log2_ppb = 6;        
        dev->dhara_nand.log2_page_size = 11; 
        break;
    default:
        return ESP_ERR_INVALID_RESPONSE;
    }
    return ESP_OK;
}

I get this error back: Screenshot_20241002_175209

Reading the datasheet, I saw this part here: Screenshot_20241002_160314 Screenshot_20241002_161541

As you can see the 12bit is passed the plane of the flash, unlike the 4Gb model, where it is not needed.
Looking at the library I can't figure out where to pass the plane bit, could you direct me?

mttcarbone avatar Oct 02 '24 15:10 mttcarbone

I don't think any special handling is needed for the plane index. The driver is sending a 16-bit block address to the flash chip:

  • Three MSBs (15-13) are ignored by the flash chip since they are above the 2GB capacity
  • Bit 12 of the column address sets the plane index
  • The rest of the bit set the block index within the plane

The only thing which looks odd is that according to figure 7, the plane index should be the LSB of the address, not MSB. However this only changes the mapping of addresses to the physical blocks in Flash, aside from some performance difference it should still work either way.

The exception you got seems unrelated to this problem, but I can't tell what specifically is wrong since the exception output is cut off in your screenshot. Please check https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-guides/fatal-errors.html for instructions on how to interpret exception output. If you post the logs, please do post them in text format rather than as images. It might also help to decrease the log level from "verbose" to "info" or "debug" since there seems to be a lot of unrelated messages in the log.

igrr avatar Oct 02 '24 17:10 igrr

Sorry @igrr , I'll fix this right away, I'll send you a file with monitor output, with debug functions turned on. I hope it can give you enough information.

debug_output.txt

mttcarbone avatar Oct 02 '24 18:10 mttcarbone

Hi @mttcarbone, thank you for the logs you provided. I was able to reproduce the issue with MT29F2G chip.

As you have suspected, the dual plane memory organization does require additional handling. For "read cache" and "program load" commands, plane index must be added as the MSB of the column address. For example, if p is the page number, column address ca should be modified to ca + (((p /64) % 2) << 12), where 64 is the number of pages per block, 2 is the number of planes, and 12 is the address bit which sets the plane index.

After modifying dhara_nand_is_bad, dhara_nand_prog, dhara_nand_is_free, and dhara_nand_read this way, the tests are passing.

I will check if other SPI NAND flash chips implement similar way of handling interleaved addressing, and will try to generalize my patch into something that won't be specific just for MT29F2G.

igrr avatar Oct 07 '24 18:10 igrr

Hi @igrr , Yes, in the last few days I had tried to modify in dhara_nand_prog this code:

ESP_GOTO_ON_ERROR(spi_nand_program_load(dev->config.device_handle, (uint8_t *)&used_marker,
                                            dev->page_size | ((p/64)%2 << 12) ) + 2, 2),
                                            //dev->page_size + 2, 2),
                     fail, TAG, "");

and also in dhara_nand_is_free this code:

ESP_GOTO_ON_ERROR(spi_nand_read(dev->config.device_handle, (uint8_t *)&used_marker,
                                (dev->page_size | ((p/64)%2 << 12) ) + 2, 2),
                                //dev->page_size + 2, 2),
                  fail, TAG, "");

but it kept giving me error, I will do some tests by modifying dhara_nand_is_bad function as well.

Would you share with me your tests on the functions in the file dhara_glue.h so I can help you test?

mttcarbone avatar Oct 07 '24 19:10 mttcarbone

Yeah, you need to modify all 4 functions (also dhara_nand_is_bad and dhara_nand_read). You can check my draft changes over here: https://github.com/espressif/idf-extra-components/pull/397/.

igrr avatar Oct 07 '24 21:10 igrr

Hi @igrr , I tried your code on my flash, and it works! Great job.

I'll share you test output, so you can compare it with yours. debug_output.txt

In my tests, I made a small modification which I will report here:

static esp_err_t spi_nand_micron_init(spi_nand_flash_device_t *dev)
{
    uint8_t device_id;
    spi_nand_transaction_t t = {
        .command = CMD_READ_ID,
        .dummy_bits = 16,
        .miso_len = 1,
        .miso_data = &device_id
    };
    spi_nand_execute_transaction(dev->config.device_handle, &t);
    dev->erase_block_delay_us = 2000;
    switch (device_id) {
    case MICRON_DI_34:
        dev->read_page_delay_us = 115;
        dev->program_page_delay_us = 240;
        dev->dhara_nand.num_blocks = 2048;
        dev->dhara_nand.log2_ppb = 6;        // 64 pages per block
        dev->dhara_nand.log2_page_size = 12; // 4096 bytes per page
        break;
    case MICRON_DI_24:
       dev->read_page_delay_us = 55;
        dev->program_page_delay_us = 220;
        dev->dhara_nand.num_blocks = 2048;
        dev->dhara_nand.log2_ppb = 6;        // 64 pages per block
        dev->dhara_nand.log2_page_size = 11; // 2048 bytes per page
        break;
    default:
        return ESP_ERR_INVALID_RESPONSE;
    }
    return ESP_OK;
}

Is related to the dev->read_page_delay_us time, in the datasheet of the 2G version they report this specification:

Screenshot_20241008_081727

I did some tests with both your time configuration and mine, it seems that the result is identical. But seeing the good work done by @UnTraDe in sticking to the 4G datasheet I wanted to replicate.

mttcarbone avatar Oct 08 '24 06:10 mttcarbone

@igrr Adding, Doing various tests with the code inside the spi_nand_flash/examples/nand_flash example that when I configured .format_if_mount_failed = false on the first boot it works, and it continues from the 3-4 reboot it starts to tell me error:

W (4616) vfs_fat_nand: f_mount failed (13)
E (4616) example: Failed to mount filesystem. If you want the flash memory to be formatted, set the CONFIG_EXAMPLE_FORMAT_IF_MOUNT_FAILED menuconfig option.
I (4636) main_task: Returned from app_main()

I made an output file of what the monitor shows me. output_monitor-2.txt

Mind you, this problem occurred by creating a new project without the test functions and the unity library for debugging. This is why this seems like a very unusual problem.

mttcarbone avatar Oct 08 '24 08:10 mttcarbone

@mttcarbone After looking at the code again, I have realized there are still at least two issues remaining:

  • dhara_nand_mark_bad also has to be modified to adjust the write address
  • dhara_nand_copy cannot simply read and program the page if source and destination pages are on different planes. We have to read out the page over SPI to the host, then write it back to the other plane, and only then program it.

I have pushed the latest version to the same PR. (Totally unverified! I didn't have this flash chip at hand today.)

Seeing that some of MXIC flash chips also use this dual-plane architecture, we will probably have to support this, however I might suggest picking a different flash chip as a possibly simpler solution, if your project allows for this.

igrr avatar Oct 09 '24 20:10 igrr

Hi @igrr , Thank you for your feedback, for the cost and space offered the 2G version is very advantageous. If it is okay with you I will continue development on the 2G, I certainly won't be as quick as you to find a solution. But I would like to take it forward, let me know when you get the chip, but I hope to get to a defined code as soon as possible.

I will keep you updated!

mttcarbone avatar Oct 10 '24 16:10 mttcarbone

Hi @igrr, Hope you are doing good! I Initialized the MT29F2G successfully and it reads and writes data. using #397

I have mounted FATFS to read/Write Files, The problem is that when I write large-size Files the Application crashes with an assert

assert failed: dhara_nand_read dhara_glue.c:219 (p < n->num_blocks * (1 << n->log2_ppb))

Maybe due to the 2 plane architecture, It can not write to even and odd number Blocks simultaneously. looking forward to your help in this regard! Thanks!

mansoorzamankhan avatar Jan 30 '25 14:01 mansoorzamankhan

Hi @mttcarbone @mansoorzamankhan , Were you able to successfully use the MT29F2G? You can check the changes in this PR: https://github.com/espressif/idf-extra-components/pull/496. This is an extension of the PR https://github.com/espressif/idf-extra-components/pull/397 and includes a fix for an issue with column address calculation when adding plane index. Feel free to try out these changes and see if they work for you.

RathiSonika avatar May 05 '25 09:05 RathiSonika