nuttx icon indicating copy to clipboard operation
nuttx copied to clipboard

CONFIG_BUILD_KERNEL: Allocating user stack fails if the initial heap size is too small

Open pussuw opened this issue 3 years ago • 4 comments

This happens with CONFIG_BUILD_KERNEL=y if the initial heap size is not large enough to hold the initial stack.

The issue is with the tcb/group initialization order when loading a task with binfmt_execmodule / exec_module.

There was a patch to fix this, but the fix only works if the initial heap is large enough to contain the task's stack + tls structure: https://github.com/apache/incubator-nuttx/commit/a44a0a08cd9d899a2ee601c80d9c36fa3a51123a

If the initial heap is not large enough, sbrk is called and the system crashes.

Details on what happens in order:

  1. An address environment was created when the binary was loaded. That is instantiated on line 156: ret = up_addrenv_select(&binp->addrenv, &oldenv); this is fine
  2. The initial heap is initialized by line 165: umm_initialize((FAR void *)CONFIG_ARCH_HEAP_VBASE, binp->addrenv.heapsize);
  3. nxtask_init() allocates memory for the group structure.
  4. nxtask_init() allocates the task stack on line 124 ret = up_create_stack(&tcb->cmn, up_tls_size() + stack_size, ttype);
  5. The allocation is done from user heap via kumm_malloc
  6. If up_tls_size() + stack_size > binp->addrenv.heapsize this will fail, as kumm_malloc will eventually call sbrk which in turn calls pgalloc()
  7. pgalloc has the following test DEBUGASSERT((group->tg_flags & GROUP_FLAG_ADDRENV) != 0); which fill fail, because group->tg_flags.GROUP_FLAG_ADDRENV is not yet set
  8. The flag GROUP_FLAG_ADDRENV is set later in binfmt_execmodule line 238: tcb->cmn.group->tg_flags |= GROUP_FLAG_ADDRENV; but this is too late.

The initialization order has to be changed / modified to mark the task's address environment as valid sooner, or define a new flag to tell sbkr/pgalloc to do the allocation without testing for that flag.

I can fix this but it requires modifying the existing kernel code and I need a suggestion how to do this correctly.

pussuw avatar Mar 21 '22 11:03 pussuw

At least @masayuki2009 might be interested to know of this issue

pussuw avatar Mar 21 '22 11:03 pussuw

I did the very original KERNEL build logic, lots of changes have happened since that I am not aware of so, unfortunately, I am probably not such a technical source. I considered the KERNEL build a work in progress and there is a roadmap document that describes what all of the final steps would be to complete the a wold-class solution: https://cwiki.apache.org/confluence/display/NUTTX/Memory+Configurations

You can see the "Line" in the center of the document. Below the "Line" are features that have not been implemented.

I think that the problem that you describe here would best be fixed by continuing to implement features on the roadmap. That might take a little bigger effort but has better payoff in the future than simply. I think that the problem that you are describing here is really as consequence of a missing feature: "Dynamic Stack Allocation. Completely eliminate the need for constant tuning of static stack sizes.(not implemented)."

If each thread's stack is dynamically allocated on-demand and does not involve the heap, then there can be no initial stack size issue.

In this case the solution would work something like this:

  • The virtual address space would need to be reorganized to make space for virtual stacks for the main thread and all possible pthreads. It has been a long time since I looked at MIPS Linux, but it used to reserve the top area of memory for virtual stack. As I recall, each virtual stack region was 2 Mb. That is large, but does not cost any "real" memory since this address spaces is not backed with real memory.
  • When the process is created, a single page is allocated for the initial stack at the "top" of this virtual address space.
  • Additional pages are added as needed whenever there are stack accesses beyond the mapped address space (via a page fault exception handler).

Another benefit, of these large virtual stack spaces is that TLS can be greatly simplified: Simply ANDing the current stack pointer by the size of the virtual stack address spaces gives you the TLS base address. That can be done in user space!

The argument against doing this and, instead, just patching up the existing logic is that this is more work. But this also takes the KERNEL build a lot further along on the Roadmap to full process support comparable to Linux. If you are willing to solve this "the correct way", I would be happy to help you in any way that I can.

patacongo avatar Mar 21 '22 14:03 patacongo

Thank you for your response, it is very much appreciated. To give a bit of context, I'm implementing kernel build support for RISC-V. I'm very close to something functional, nsh already boots up and I can execute commands from the terminal. This issue is one of the "loose ends" I'm trying to tie up.

I agree, the best solution for this would be to implement the dynamic stack support. It would get rid of any and all initial stack size issues once and for all. It will take a bit of time but it is more of a canonical solution. The good thing is that I can circumvent this issue temporarily by just allocating more heap upon boot.

I already discussed this internally and we all agree that implementing the dynamic stack size support should be one of the targets of what I'm doing. I will most likely start this immediately after fixing the last issues I have with stability.

pussuw avatar Mar 21 '22 14:03 pussuw

I understand. I'm glad we agree, but reality and the practical nature of business usually make the decisions for us.

There is some on-demand page logic in place, but has not been used in years. You can see it be searching for CONFIG_PAGING. Several years ago, I used an LPC31xx, in particular the EA3131 board. The LPC3131 is an ARM926EJS. It has 192 Kb of internal SRAM and NO internal FLASH. The object was to prototype a system that ran large programs only with only 192Kb with on-demand paging from a serial FLASH. That was really a very cool idea. QuadSPI made the project less interesting today.

So in addition to the CONFIG_PAGING logic under arch/arm/src, there is also a lot of on-demand logic for the ARM under boards/arm/lpc31xx/ea3131. The ARM926EJS differs a little from modern ARMs, but perhaps this code could be helpful to you in the future if you decide to take up this challenge.

patacongo avatar Mar 21 '22 16:03 patacongo