ast icon indicating copy to clipboard operation
ast copied to clipboard

Refactor allocations that include multiple structures

Open krader1961 opened this issue 5 years ago • 3 comments

See issue #824 and PR #827 for background. The core problem is that there are multiple places in the code which allocate a single buffer that is carved up into individual structures which point to each other. This leads to allocations like this (from src/cmd/ksh93/sh/args.c):

malloc(sizeof(struct Sort) + strlen(keylist) + (sizeof(char *) + 1) * keys +
       (arp->nelem + 1) * (nodesize + sizeof(void *)) + c + 3);

I challenge you to tell me whether that allocation is the correct size or not. Show your work.

Hint: It was definitely not correct prior to PR #827 because nodesize might have been too small. And even after that change I can't tell whether the size is correct. In no small part because of subexpressions like the trailing + 3. Why do three extra bytes need to be included in the allocation? What is the purpose of the + 1 in subexpression (sizeof(char *) + 1) * keys?

krader1961 avatar Sep 03 '18 04:09 krader1961

malloc(sizeof(struct Sort) + strlen(keylist) + (sizeof(char *) + 1) * keys + (arp->nelem + 1) * (nodesize + sizeof(void *)) + c + 3);

I see this kind of code (and worse) everyday! I will not name names in order to protect the guilty. But yes, this is extremely bad programming practice, and should be refactored when possible.

DavidMorano avatar Sep 03 '18 14:09 DavidMorano

FWIW, I have no objection to patterns like this:

struct {
    int x;
    int y;
    char *ptrs[0];
} a_struct;

int nptrs = 3;
struct a_struct *z = malloc(sizeof(struct a_struct) + sizeof(char *) * nptrs);

That's a commonly used idiom that is easy to understand and verify is correct.

P.S., When I asked earlier what the + 3 term was for I was being a bit disingenuous. There are two strings being embedded in the buffer so two of those three bytes are for the terminating null character of each string. That still leaves the third byte unexplained as well as the other places + 1 appears.

krader1961 avatar Sep 03 '18 22:09 krader1961

By the way, the pattern I'm arguing should be eliminated also makes tools like ASAN (address sanitizer) much less useful. That's because the buffers are contiguous which limits the ability of tools like ASAN to detect when a buffer is accessed beyond its boundaries. Which results in false negatives (aka silent bugs).

krader1961 avatar Mar 04 '19 04:03 krader1961