git icon indicating copy to clipboard operation
git copied to clipboard

Allocate msg only after fatal checks to avoid leaks

Open mugitya03 opened this issue 7 months ago • 12 comments

cc: lidongyan [email protected] cc: Jeff King [email protected]

mugitya03 avatar Jun 12 '25 23:06 mugitya03

/submit

jinyaoguo avatar Jun 13 '25 18:06 jinyaoguo

Error: User jinyaoguo is not yet permitted to use GitGitGadget

gitgitgadget-git[bot] avatar Jun 13 '25 18:06 gitgitgadget-git[bot]

/submit

mugitya03 avatar Jun 13 '25 19:06 mugitya03

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-git-1998/mugitya03/mlk-3-v1

To fetch this version to local tag pr-git-1998/mugitya03/mlk-3-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-git-1998/mugitya03/mlk-3-v1

gitgitgadget-git[bot] avatar Jun 13 '25 19:06 gitgitgadget-git[bot]

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Alex via GitGitGadget" <[email protected]> writes:

> -	if (type != OBJ_BLOB) {
> -		strbuf_release(&msg->buf);
> -		free(value);
> -		free(msg);
> -		die(_("cannot read note data from non-blob object '%s'."), arg);
> -	}
> +    if (type != OBJ_BLOB) {
> +        free(value);
> +        die(_("cannot read note data from non-blob object '%s'."), arg);
> +    }
> +
> +    msg = xmalloc(sizeof(*msg));
> +    strbuf_init(&msg->buf, 0);

ALl the new lines seem to be indented by four spaces.  Check with
Documantation/CodingGuidelines.

Also, Documantation/SubmittingPatches::[[real-name]] asks folks to
use their real name as authorname.  You prefer your purdue
address, that is fine, but let's do something like

    From: Jinyao Guo <[email protected]>
    Signed-off-by: Jinyao Guo <[email protected]>

gitgitgadget-git[bot] avatar Jun 13 '25 20:06 gitgitgadget-git[bot]

On the Git mailing list, lidongyan wrote (reply to this):

Alex via GitGitGadget <[email protected]> writes:
> 
> From: jinyaoguo <[email protected]>
> 
> In parse_reuse_arg, we previously called xmalloc and strbuf_init
> before resolving the ref and reading the object, leading to a
> leaked msg on die() paths. This change moves the allocation of

A memory leak on the die() path shouldn't be considered a real leak,
right? Since the OS will clean up all memory once the process
terminates, explicitly freeing msg isn't necessary in this case.

Lidong

gitgitgadget-git[bot] avatar Jun 14 '25 08:06 gitgitgadget-git[bot]

User lidongyan <[email protected]> has been added to the cc: list.

gitgitgadget-git[bot] avatar Jun 14 '25 08:06 gitgitgadget-git[bot]

On the Git mailing list, Junio C Hamano wrote (reply to this):

lidongyan <[email protected]> writes:

> Alex via GitGitGadget <[email protected]> writes:
>> 
>> From: jinyaoguo <[email protected]>
>> 
>> In parse_reuse_arg, we previously called xmalloc and strbuf_init
>> before resolving the ref and reading the object, leading to a
>> leaked msg on die() paths. This change moves the allocation of
>
> A memory leak on the die() path shouldn't be considered a real leak,
> right? Since the OS will clean up all memory once the process
> terminates, explicitly freeing msg isn't necessary in this case.

It may not matter in practice, but I think the leak checking
machinery like sanitizers would still complain, so I view efforts on
plugging such leaks in the error code paths more about decluttering
the leak checker output to help us spot the real leaks.

gitgitgadget-git[bot] avatar Jun 14 '25 15:06 gitgitgadget-git[bot]

On the Git mailing list, lidongyan wrote (reply to this):

Junio C Hamano <[email protected]> writes:
> 
> lidongyan <[email protected]> writes:
> 
>> Alex via GitGitGadget <[email protected]> writes:
>>> 
>>> From: jinyaoguo <[email protected]>
>>> 
>>> In parse_reuse_arg, we previously called xmalloc and strbuf_init
>>> before resolving the ref and reading the object, leading to a
>>> leaked msg on die() paths. This change moves the allocation of
>> 
>> A memory leak on the die() path shouldn't be considered a real leak,
>> right? Since the OS will clean up all memory once the process
>> terminates, explicitly freeing msg isn't necessary in this case.
> 
> It may not matter in practice, but I think the leak checking
> machinery like sanitizers would still complain, so I view efforts on
> plugging such leaks in the error code paths more about decluttering
> the leak checker output to help us spot the real leaks.

Makes sense. However, inserting a free-like statement in die() would be
messier than using goto, since each die() has a unique message and we
need to free stuff at each die().

gitgitgadget-git[bot] avatar Jun 14 '25 15:06 gitgitgadget-git[bot]

On the Git mailing list, Jeff King wrote (reply to this):

On Sat, Jun 14, 2025 at 08:40:43AM -0700, Junio C Hamano wrote:

> > A memory leak on the die() path shouldn't be considered a real leak,
> > right? Since the OS will clean up all memory once the process
> > terminates, explicitly freeing msg isn't necessary in this case.
> 
> It may not matter in practice, but I think the leak checking
> machinery like sanitizers would still complain, so I view efforts on
> plugging such leaks in the error code paths more about decluttering
> the leak checker output to help us spot the real leaks.

I disagree here. These are not really leaks, and a leak-checker that
complains about them is bad.

When we call die(), the pointer to the buffer is still on the stack, and
thus the memory is still reachable and not leaked. Some tools like
valgrind may still report these as "still reachable", but because they
categorize them properly we can ignore them[1].

The one exception we've seen is that an optimizing compiler may reorder
instructions to obliterate the stack (because it knows die() is marked
with NORETURN), causing a false positive. We dealt with that via
d3775de074 (Makefile: force -O0 when compiling with SANITIZE=leak,
2022-10-18). I don't think we've seen any recurrence since then.

And while it may be tempting to say "well, it does not hurt to free them
on the die() path", in my opinion that way madness lies. You may have
access to some local variables that can be freed, but there will be many
other heap allocations that you don't even know about! Here's a toy
example from a similar discussion a few years ago:

  https://lore.kernel.org/git/[email protected]/

So I'd really prefer not to go down this route. And I think the existing
code in this patch's pre-image that calls free() before die() only on
one path should be simplified, so that all die() paths consistently do
not worry about this.

I.e., this:

diff --git a/builtin/notes.c b/builtin/notes.c
index cc1163242f..f3d5eda104 100644
--- a/builtin/notes.c
+++ b/builtin/notes.c
@@ -321,12 +321,8 @@ static int parse_reuse_arg(const struct option *opt, const char *arg, int unset)
 		die(_("failed to resolve '%s' as a valid ref."), arg);
 	if (!(value = odb_read_object(the_repository->objects, &object, &type, &len)))
 		die(_("failed to read object '%s'."), arg);
-	if (type != OBJ_BLOB) {
-		strbuf_release(&msg->buf);
-		free(value);
-		free(msg);
+	if (type != OBJ_BLOB)
 		die(_("cannot read note data from non-blob object '%s'."), arg);
-	}
 
 	strbuf_add(&msg->buf, value, len);
 	free(value);

-Peff

[1] "still reachable" leaks _can_ be useful when returning from main,
    because they may show memory held in global structures that we might
    have been able to free at a more timely spot. But there are a lot of
    these in Git, most of which are not interesting (e.g., is freeing
    the_repository really worth caring about?), and I don't think there
    is a good way to tell the difference.

    More pontificating from that earlier discussion:

      https://lore.kernel.org/git/[email protected]/

gitgitgadget-git[bot] avatar Jun 14 '25 23:06 gitgitgadget-git[bot]

User Jeff King <[email protected]> has been added to the cc: list.

gitgitgadget-git[bot] avatar Jun 14 '25 23:06 gitgitgadget-git[bot]

On the Git mailing list, Junio C Hamano wrote (reply to this):

Jeff King <[email protected]> writes:

> And while it may be tempting to say "well, it does not hurt to free them
> on the die() path", in my opinion that way madness lies. You may have
> access to some local variables that can be freed, but there will be many
> other heap allocations that you don't even know about! Here's a toy
> example from a similar discussion a few years ago:
>
>   https://lore.kernel.org/git/[email protected]/

Yeah, I recall that discussion and the example.  Yes, we should not
have to crawl up from a direct caller of die() and free everything
these stack frames hold.

> I.e., this:
>
> diff --git a/builtin/notes.c b/builtin/notes.c
> index cc1163242f..f3d5eda104 100644
> --- a/builtin/notes.c
> +++ b/builtin/notes.c
> @@ -321,12 +321,8 @@ static int parse_reuse_arg(const struct option *opt, const char *arg, int unset)
>  		die(_("failed to resolve '%s' as a valid ref."), arg);
>  	if (!(value = odb_read_object(the_repository->objects, &object, &type, &len)))
>  		die(_("failed to read object '%s'."), arg);
> -	if (type != OBJ_BLOB) {
> -		strbuf_release(&msg->buf);
> -		free(value);
> -		free(msg);
> +	if (type != OBJ_BLOB)
>  		die(_("cannot read note data from non-blob object '%s'."), arg);
> -	}

Much nicer.

gitgitgadget-git[bot] avatar Jun 15 '25 00:06 gitgitgadget-git[bot]