hardened_malloc icon indicating copy to clipboard operation
hardened_malloc copied to clipboard

perform size checks on memcpy/memmove/memset

Open SkewedZeppelin opened this issue 9 months ago • 26 comments

for https://github.com/GrapheneOS/hardened_malloc/issues/231

  • largely works system wide on fedora 41 & 42

task list

  • [X] gcc compiler doing weird things
    • no longer an issue with using real underlying functions
  • [ ] clang compiler doing weird things
    • it runs now, but size checks are always max
  • [X] whole object size checks (fast path)
  • [X] object remaining size checks (non-fast path)
  • [X] optimized assembly functions
  • [X] memcpy
  • [X] memccpy
    • [ ] handle common read overflow case
  • [X] memmove
  • [X] memset
  • [X] wmemcpy
  • [X] wmemmove
  • [X] wmemset
  • [X] bypass overrides for self
  • [X] licensing
  • [X] makefile bits
  • [X] readme
    • could be expanded on
  • [X] test case for memcpy
    • [ ] overlap test
  • [X] test case for memccpy
  • [X] test case for memmove
  • [X] test case for memset
  • [X] test case for wmemcpy
  • [X] test case for wmemmove
  • [X] test case for wmemset
  • [ ] run all the test cases
    • the feature is default disabled so they can't be run without failing
  • [ ] figure out why test cases fail under CI when enabled
    • they all pass on my end
    • still not working on latest patchset
  • [X] figure out why so many gnome apps crash
    • fatal allocator error: invalid malloc_object_size
    • conflict with gjs/mozjs?
    • crashes under f42, but not f41: clocks, calculator, baobab, fileroller, logs
    • crashes under f41, but not f42: gnome-shell when clicking top bar controls
    • can't reproduce anymore, unsure why
  • [X] figure out how to handle chromium/electron crash/conflict
    • can't reproduce anymore, only happens on fast path
  • [X] figure out if it is possible to use the real underlying functions for better per-arch performance
    • dlsym doesn't seem to work with all program such as mutter-x11-frames
      • can't reproduce anymore
    • this doesn't necessarily pull from libc, but can pull from other libraries
    • it feels unsafe

compared to others:

  • redfat: memchr, memcmp, memcpy, memmove, memrchr, memset, strcasecmp, strcasestr, strcat, strchr, strchrnul, strcmp, strcpy, strlen, strncasecmp, strncat, strncmp, strncpy, strnlen, strrchr, strstr
  • this patchset: bcopy, memccpy, memcpy, memmove, mempcpy, memset, swab, wmemcpy, wmemmove, wmempcpy, wmemset
  • isoalloc: memcpy, memmove, memset
  • snmalloc: memcpy, and previously memmove (disabled due to issue found by fuzzing)

SkewedZeppelin avatar Mar 21 '25 22:03 SkewedZeppelin

@thestinger is it expected that malloc_object_size would return negative values?

SkewedZeppelin avatar Mar 22 '25 01:03 SkewedZeppelin

@SkewedZeppelin It returns size_t so it can't be negative. When it doesn't know the answer, it returns SIZE_MAX. Are you treating it as signed somewhere?

thestinger avatar Mar 22 '25 01:03 thestinger

https://github.com/GrapheneOS/hardened_malloc/blob/4fe9018b6fc7e89b68b6a4b34ea2a853e8778b77/h_malloc.c#L1852 is negative sometimes

SkewedZeppelin avatar Mar 22 '25 02:03 SkewedZeppelin

@SkewedZeppelin It's a size_t, it can't be negative. SIZE_MAX is the maximum size_t value. You must be printing it as a signed integer where it would be -1.

thestinger avatar Mar 22 '25 02:03 thestinger

apologies I'm dumb and was using %ld not %lu

SkewedZeppelin avatar Mar 22 '25 02:03 SkewedZeppelin

@SkewedZeppelin It's not particularly important but you should use %zu for size_t.

thestinger avatar Mar 22 '25 02:03 thestinger

It matters on Windows where long is 32-bit on 64-bit but could at least theoretically be the case elsewhere.

thestinger avatar Mar 22 '25 02:03 thestinger

is it worth it to zero the remainder of dst? my only issue is that sometimes it is close to size_max/unknown

SkewedZeppelin avatar Mar 22 '25 02:03 SkewedZeppelin

@SkewedZeppelin That wouldn't be safe since it's not known what they're doing with it. They could be intentionally only copying to part of it. It can be a copy to the middle of it, etc.

thestinger avatar Mar 22 '25 02:03 thestinger

ok, if I disable optimizations for the block everything works as expected otherwise memcpy and memset recurse into themself and crash I have no idea how to fix that

what I want to happen: memcpy -> memcpy_wrapped -> memcpy_real what the compiler is making happen: memcpy -> memcpy_wrapped -> memcpy - > memcpy_wrapped -> etc reducing optimizations on the _real functions works, but is harming performance

clang has __attribute__((no_builtin("memcpy"))), but gcc doesn't edit: this attribute doesn't work, optnone does for clang gcc has an attribute that works

edit: I fixed gcc

edit: seems like object size almost always returns size_max under clang

slab_region_end is always null under clang?

SkewedZeppelin avatar Mar 23 '25 13:03 SkewedZeppelin

I added assembly functions for the underlying functions from musl, although I think I made a mess of doing so. If there is a better way of handling them, please let me know. Thanks

edit: I think the makefile auto-detection is actually broken. I did verify that it is functional when bypassed.

SkewedZeppelin avatar Mar 23 '25 17:03 SkewedZeppelin

I am currently successfully running the following patch revision:

git diff
From bf8092da25f3f5ea484104aa234793c5c6730db6 Mon Sep 17 00:00:00 2001
From: Tavi <[email protected]>
Date: Sat, 22 Mar 2025 02:54:10 -0400
Subject: [PATCH] perform size checks on memcpy/memmove/memset

Signed-off-by: Tavi <[email protected]>
---
 Android.bp                      |   1 +
 Makefile                        |  24 +++++-
 README.md                       |   3 +
 config/default.mk               |   1 +
 config/light.mk                 |   1 +
 h_malloc.c                      |  94 +++++++++++++++++++++--
 include/h_malloc.h              |  16 ++++
 memcpy.c                        | 132 ++++++++++++++++++++++++++++++++
 memmove.c                       |  50 ++++++++++++
 memset.c                        |  94 +++++++++++++++++++++++
 musl.h                          |  10 +++
 test/.gitignore                 |  11 +++
 test/Makefile                   |  13 +++-
 test/memcpy_buffer_overflow.c   |  15 ++++
 test/memcpy_read_overflow.c     |  15 ++++
 test/memcpy_valid_mismatched.c  |  15 ++++
 test/memcpy_valid_same.c        |  15 ++++
 test/memmove_buffer_overflow.c  |  15 ++++
 test/memmove_read_overflow.c    |  15 ++++
 test/memmove_valid_mismatched.c |  15 ++++
 test/memmove_valid_same.c       |  15 ++++
 test/memset_buffer_overflow.c   |  13 ++++
 test/memset_valid_mismatched.c  |  13 ++++
 test/memset_valid_same.c        |  13 ++++
 test/test_smc.py                |  65 ++++++++++++++++
 wmemcpy.c                       |  12 +++
 wmemmove.c                      |  17 ++++
 wmemset.c                       |  12 +++
 28 files changed, 705 insertions(+), 10 deletions(-)
 create mode 100644 memcpy.c
 create mode 100644 memmove.c
 create mode 100644 memset.c
 create mode 100644 musl.h
 create mode 100644 test/memcpy_buffer_overflow.c
 create mode 100644 test/memcpy_read_overflow.c
 create mode 100644 test/memcpy_valid_mismatched.c
 create mode 100644 test/memcpy_valid_same.c
 create mode 100644 test/memmove_buffer_overflow.c
 create mode 100644 test/memmove_read_overflow.c
 create mode 100644 test/memmove_valid_mismatched.c
 create mode 100644 test/memmove_valid_same.c
 create mode 100644 test/memset_buffer_overflow.c
 create mode 100644 test/memset_valid_mismatched.c
 create mode 100644 test/memset_valid_same.c
 create mode 100644 wmemcpy.c
 create mode 100644 wmemmove.c
 create mode 100644 wmemset.c

diff --git a/Android.bp b/Android.bp
index f6a7a9c..a2bab52 100644
--- a/Android.bp
+++ b/Android.bp
@@ -28,6 +28,7 @@ common_cflags = [
     "-DN_ARENA=1",
     "-DCONFIG_STATS=true",
     "-DCONFIG_SELF_INIT=false",
+    "-DCONFIG_BLOCK_OPS_CHECK_SIZE=false",
 ]
 
 cc_defaults {
diff --git a/Makefile b/Makefile
index f33f88e..554560f 100644
--- a/Makefile
+++ b/Makefile
@@ -39,7 +39,7 @@ CFLAGS := $(CFLAGS) -std=c17 $(SHARED_FLAGS) -Wmissing-prototypes -Wstrict-proto
 CXXFLAGS := $(CXXFLAGS) -std=c++17 -fsized-deallocation $(SHARED_FLAGS)
 LDFLAGS := $(LDFLAGS) -Wl,-O1,--as-needed,-z,defs,-z,relro,-z,now,-z,nodlopen,-z,text
 
-SOURCES := chacha.c h_malloc.c memory.c pages.c random.c util.c
+SOURCES := chacha.c h_malloc.c memory.c pages.c random.c util.c memcpy.c memmove.c memset.c wmemcpy.c wmemmove.c wmemset.c
 OBJECTS := $(SOURCES:.c=.o)
 
 ifeq ($(CONFIG_CXX_ALLOCATOR),true)
@@ -89,6 +89,10 @@ ifeq (,$(filter $(CONFIG_SELF_INIT),true false))
     $(error CONFIG_SELF_INIT must be true or false)
 endif
 
+ifeq (,$(filter $(CONFIG_BLOCK_OPS_CHECK_SIZE),true false))
+    $(error CONFIG_BLOCK_OPS_CHECK_SIZE must be true or false)
+endif
+
 CPPFLAGS += \
     -DCONFIG_SEAL_METADATA=$(CONFIG_SEAL_METADATA) \
     -DZERO_ON_FREE=$(CONFIG_ZERO_ON_FREE) \
@@ -108,7 +112,8 @@ CPPFLAGS += \
     -DCONFIG_CLASS_REGION_SIZE=$(CONFIG_CLASS_REGION_SIZE) \
     -DN_ARENA=$(CONFIG_N_ARENA) \
     -DCONFIG_STATS=$(CONFIG_STATS) \
-    -DCONFIG_SELF_INIT=$(CONFIG_SELF_INIT)
+    -DCONFIG_SELF_INIT=$(CONFIG_SELF_INIT) \
+    -DCONFIG_BLOCK_OPS_CHECK_SIZE=$(CONFIG_BLOCK_OPS_CHECK_SIZE)
 
 $(OUT)/libhardened_malloc$(SUFFIX).so: $(OBJECTS) | $(OUT)
 	$(CC) $(CFLAGS) $(LDFLAGS) -shared $^ $(LDLIBS) -o $@
@@ -118,7 +123,7 @@ $(OUT):
 
 $(OUT)/chacha.o: chacha.c chacha.h util.h $(CONFIG_FILE) | $(OUT)
 	$(COMPILE.c) $(OUTPUT_OPTION) $<
-$(OUT)/h_malloc.o: h_malloc.c include/h_malloc.h mutex.h memory.h pages.h random.h util.h $(CONFIG_FILE) | $(OUT)
+$(OUT)/h_malloc.o: h_malloc.c include/h_malloc.h mutex.h memory.h musl.h pages.h random.h util.h $(CONFIG_FILE) | $(OUT)
 	$(COMPILE.c) $(OUTPUT_OPTION) $<
 $(OUT)/memory.o: memory.c memory.h util.h $(CONFIG_FILE) | $(OUT)
 	$(COMPILE.c) $(OUTPUT_OPTION) $<
@@ -131,6 +136,19 @@ $(OUT)/random.o: random.c random.h chacha.h util.h $(CONFIG_FILE) | $(OUT)
 $(OUT)/util.o: util.c util.h $(CONFIG_FILE) | $(OUT)
 	$(COMPILE.c) $(OUTPUT_OPTION) $<
 
+$(OUT)/memcpy.o: memcpy.c musl.h $(CONFIG_FILE) | $(OUT)
+	$(COMPILE.c) -Wno-cast-align $(OUTPUT_OPTION) $<
+$(OUT)/memmove.o: memmove.c musl.h $(CONFIG_FILE) | $(OUT)
+	$(COMPILE.c) -Wno-cast-align $(OUTPUT_OPTION) $<
+$(OUT)/memset.o: memset.c musl.h $(CONFIG_FILE) | $(OUT)
+	$(COMPILE.c) -Wno-cast-align $(OUTPUT_OPTION) $<
+$(OUT)/wmemcpy.o: wmemcpy.c musl.h $(CONFIG_FILE) | $(OUT)
+	$(COMPILE.c) $(OUTPUT_OPTION) $<
+$(OUT)/wmemmove.o: wmemmove.c musl.h $(CONFIG_FILE) | $(OUT)
+	$(COMPILE.c) $(OUTPUT_OPTION) $<
+$(OUT)/wmemset.o: wmemset.c musl.h $(CONFIG_FILE) | $(OUT)
+	$(COMPILE.c) $(OUTPUT_OPTION) $<
+
 check: tidy
 
 tidy:
diff --git a/README.md b/README.md
index 6a1a91b..972b6dd 100644
--- a/README.md
+++ b/README.md
@@ -276,6 +276,9 @@ The following boolean configuration options are available:
   hardware, which may become drastically lower in the future. Whether or not
   this feature is enabled, the metadata is all contained within an isolated
   memory region with high entropy random guard regions around it.
+* `CONFIG_BLOCK_OPS_CHECK_SIZE`: `true` or `false` (default) to ensure length
+  parameter of the memcpy/memmove/memset block operations and their wide
+  variants are within approximate bounds to minimize buffer overflows.
 
 The following integer configuration options are available:
 
diff --git a/config/default.mk b/config/default.mk
index 71b1cc4..071e4d3 100644
--- a/config/default.mk
+++ b/config/default.mk
@@ -21,3 +21,4 @@ CONFIG_CLASS_REGION_SIZE := 34359738368 # 32GiB
 CONFIG_N_ARENA := 4
 CONFIG_STATS := false
 CONFIG_SELF_INIT := true
+CONFIG_BLOCK_OPS_CHECK_SIZE := true
diff --git a/config/light.mk b/config/light.mk
index 88a0e1f..7edd423 100644
--- a/config/light.mk
+++ b/config/light.mk
@@ -21,3 +21,4 @@ CONFIG_CLASS_REGION_SIZE := 34359738368 # 32GiB
 CONFIG_N_ARENA := 4
 CONFIG_STATS := false
 CONFIG_SELF_INIT := true
+CONFIG_BLOCK_OPS_CHECK_SIZE := false
diff --git a/h_malloc.c b/h_malloc.c
index 6221d0b..7f87da1 100644
--- a/h_malloc.c
+++ b/h_malloc.c
@@ -1,4 +1,5 @@
 #include <assert.h>
+//#include <dlfcn.h>
 #include <errno.h>
 #include <inttypes.h>
 #include <stdatomic.h>
@@ -15,6 +16,7 @@
 #include "h_malloc.h"
 #include "memory.h"
 #include "memtag.h"
+#include "musl.h"
 #include "mutex.h"
 #include "pages.h"
 #include "random.h"
@@ -528,7 +530,7 @@ static void set_canary(UNUSED const struct slab_metadata *metadata, UNUSED void
     }
 #endif
 
-    memcpy((char *)p + size - canary_size, &metadata->canary_value, canary_size);
+    h_memcpy_internal((char *)p + size - canary_size, &metadata->canary_value, canary_size);
 #endif
 }
 
@@ -541,7 +543,7 @@ static void check_canary(UNUSED const struct slab_metadata *metadata, UNUSED con
 #endif
 
     u64 canary_value;
-    memcpy(&canary_value, (const char *)p + size - canary_size, canary_size);
+    h_memcpy_internal(&canary_value, (const char *)p + size - canary_size, canary_size);
 
 #ifdef HAS_ARM_MTE
     if (unlikely(canary_value == 0)) {
@@ -831,7 +833,7 @@ static inline void deallocate_small(void *p, const size_t *expected_size) {
 #endif
 
         if (ZERO_ON_FREE && !skip_zero) {
-            memset(p, 0, size - canary_size);
+            h_memset_internal(p, 0, size - canary_size);
         }
     }
 
@@ -1502,7 +1504,7 @@ EXPORT void *h_calloc(size_t nmemb, size_t size) {
     total_size = adjust_size_for_canary(total_size);
     void *p = alloc(total_size);
     if (!ZERO_ON_FREE && likely(p != NULL) && total_size && total_size <= max_slab_size_class) {
-        memset(p, 0, total_size - canary_size);
+        h_memset_internal(p, 0, total_size - canary_size);
     }
 #ifdef HAS_ARM_MTE
     // use an assert instead of adding a conditional to memset() above (freed memory is always
@@ -1624,7 +1626,7 @@ EXPORT void *h_realloc(void *old, size_t size) {
                 mutex_unlock(&ra->lock);
 
                 if (memory_remap_fixed(old, old_size, new, size)) {
-                    memcpy(new, old, copy_size);
+                    h_memcpy_internal(new, old, copy_size);
                     deallocate_pages(old, old_size, old_guard_size);
                 } else {
                     memory_unmap((char *)old - old_guard_size, old_guard_size);
@@ -1646,7 +1648,7 @@ EXPORT void *h_realloc(void *old, size_t size) {
     if (copy_size > 0 && copy_size <= max_slab_size_class) {
         copy_size -= canary_size;
     }
-    memcpy(new, old_orig, copy_size);
+    h_memcpy_internal(new, old_orig, copy_size);
     if (old_size <= max_slab_size_class) {
         deallocate_small(old, NULL);
     } else {
@@ -1874,6 +1876,86 @@ EXPORT size_t h_malloc_object_size_fast(const void *p) {
     return SIZE_MAX;
 }
 
+#if CONFIG_BLOCK_OPS_CHECK_SIZE && !defined(HAS_ARM_MTE)
+EXPORT void *memcpy(void *restrict dst, const void *restrict src, size_t len) {
+    if(dst == src || len == 0) {
+        return dst;
+    }
+    if (unlikely(dst < src + len && dst + len > src)) {
+        fatal_error("memcpy overlap");
+    }
+    if (unlikely(len > malloc_object_size(src))) {
+        fatal_error("memcpy read overflow");
+    }
+    if (unlikely(len > malloc_object_size(dst))) {
+        fatal_error("memcpy buffer overflow");
+    }
+    return musl_memcpy(dst, src, len);
+}
+
+EXPORT void *memmove(void *dst, const void *src, size_t len) {
+    if(dst == src || len == 0) {
+        return dst;
+    }
+    if (unlikely(len > malloc_object_size(src))) {
+        fatal_error("memmove read overflow");
+    }
+    if (unlikely(len > malloc_object_size(dst))) {
+        fatal_error("memmove buffer overflow");
+    }
+    return musl_memmove(dst, src, len);
+}
+
+EXPORT void *memset(void *dst, int value, size_t len) {
+    if(len == 0) {
+        return dst;
+    }
+    if (unlikely(len > malloc_object_size(dst))) {
+        fatal_error("memset buffer overflow");
+    }
+    return musl_memset(dst, value, len);
+}
+
+EXPORT wchar_t *wmemcpy(wchar_t *restrict dst, const wchar_t *restrict src, size_t len) {
+    if(dst == src || len == 0) {
+        return dst;
+    }
+    if (dst < src + len && dst + len > src) {
+        fatal_error("wmemcpy overlap");
+    }
+    if (len > malloc_object_size(src)) {
+        fatal_error("wmemcpy read overflow");
+    }
+    if (len > malloc_object_size(dst)) {
+        fatal_error("wmemcpy buffer overflow");
+    }
+    return musl_wmemcpy(dst, src, len);
+}
+
+EXPORT wchar_t *wmemmove(wchar_t *dst, const wchar_t *src, size_t len) {
+    if(dst == src || len == 0) {
+        return dst;
+    }
+    if (len > malloc_object_size(src)) {
+        fatal_error("wmemmove read overflow");
+    }
+    if (len > malloc_object_size(dst)) {
+        fatal_error("wmemmove buffer overflow");
+    }
+    return musl_wmemmove(dst, src, len);
+}
+
+EXPORT wchar_t *wmemset(wchar_t *dst, wchar_t value, size_t len) {
+    if(len == 0) {
+        return dst;
+    }
+    if (len > malloc_object_size(dst)) {
+        fatal_error("wmemset buffer overflow");
+    }
+    return musl_wmemset(dst, value, len);
+}
+#endif
+
 EXPORT int h_mallopt(UNUSED int param, UNUSED int value) {
 #ifdef __ANDROID__
     if (param == M_PURGE) {
diff --git a/include/h_malloc.h b/include/h_malloc.h
index 0eee395..ceca456 100644
--- a/include/h_malloc.h
+++ b/include/h_malloc.h
@@ -55,6 +55,22 @@ __attribute__((malloc)) __attribute__((alloc_size(2))) __attribute__((alloc_alig
 void *h_aligned_alloc(size_t alignment, size_t size);
 void h_free(void *ptr);
 
+#if CONFIG_BLOCK_OPS_CHECK_SIZE && !defined(HAS_ARM_MTE)
+void *memcpy(void *dst, const void *src, size_t len);
+void *memmove(void *dst, const void *src, size_t len);
+void *memset(void *dst, int value, size_t len);
+wchar_t *wmemcpy(wchar_t *dst, const wchar_t *src, size_t len);
+wchar_t *wmemmove(wchar_t *dst, const wchar_t *src, size_t len);
+wchar_t *wmemset(wchar_t *dst, wchar_t value, size_t len);
+#define h_memcpy_internal musl_memcpy
+#define h_memmove_internal musl_memmove
+#define h_memset_internal musl_memset
+#else
+#define h_memcpy_internal memcpy
+#define h_memmove_internal memmove
+#define h_memset_internal memset
+#endif
+
 // POSIX
 int h_posix_memalign(void **memptr, size_t alignment, size_t size);
 
diff --git a/memcpy.c b/memcpy.c
new file mode 100644
index 0000000..f9cff3f
--- /dev/null
+++ b/memcpy.c
@@ -0,0 +1,132 @@
+#include "musl.h"
+
+/*
+ * Copied from musl libc version 1.2.5 licensed under the MIT license
+ *
+ * Christian Göttsche: Added const qualifiers to retain const correctness.
+ */
+
+#include <string.h>
+#include <stdint.h>
+#include <endian.h>
+
+void *musl_memcpy(void *restrict dest, const void *restrict src, size_t n)
+{
+	unsigned char *d = dest;
+	const unsigned char *s = src;
+
+#ifdef __GNUC__
+
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+#define LS >>
+#define RS <<
+#else
+#define LS <<
+#define RS >>
+#endif
+
+	typedef uint32_t __attribute__((__may_alias__)) u32;
+	uint32_t w, x;
+
+	for (; (uintptr_t)s % 4 && n; n--) *d++ = *s++;
+
+	if ((uintptr_t)d % 4 == 0) {
+		for (; n>=16; s+=16, d+=16, n-=16) {
+			*(u32 *)(d+0) = *(const u32 *)(s+0);
+			*(u32 *)(d+4) = *(const u32 *)(s+4);
+			*(u32 *)(d+8) = *(const u32 *)(s+8);
+			*(u32 *)(d+12) = *(const u32 *)(s+12);
+		}
+		if (n&8) {
+			*(u32 *)(d+0) = *(const u32 *)(s+0);
+			*(u32 *)(d+4) = *(const u32 *)(s+4);
+			d += 8; s += 8;
+		}
+		if (n&4) {
+			*(u32 *)(d+0) = *(const u32 *)(s+0);
+			d += 4; s += 4;
+		}
+		if (n&2) {
+			*d++ = *s++; *d++ = *s++;
+		}
+		if (n&1) {
+			*d = *s;
+		}
+		return dest;
+	}
+
+	if (n >= 32) switch ((uintptr_t)d % 4) {
+	case 1:
+		w = *(const u32 *)s;
+		*d++ = *s++;
+		*d++ = *s++;
+		*d++ = *s++;
+		n -= 3;
+		for (; n>=17; s+=16, d+=16, n-=16) {
+			x = *(const u32 *)(s+1);
+			*(u32 *)(d+0) = (w LS 24) | (x RS 8);
+			w = *(const u32 *)(s+5);
+			*(u32 *)(d+4) = (x LS 24) | (w RS 8);
+			x = *(const u32 *)(s+9);
+			*(u32 *)(d+8) = (w LS 24) | (x RS 8);
+			w = *(const u32 *)(s+13);
+			*(u32 *)(d+12) = (x LS 24) | (w RS 8);
+		}
+		break;
+	case 2:
+		w = *(const u32 *)s;
+		*d++ = *s++;
+		*d++ = *s++;
+		n -= 2;
+		for (; n>=18; s+=16, d+=16, n-=16) {
+			x = *(const u32 *)(s+2);
+			*(u32 *)(d+0) = (w LS 16) | (x RS 16);
+			w = *(const u32 *)(s+6);
+			*(u32 *)(d+4) = (x LS 16) | (w RS 16);
+			x = *(const u32 *)(s+10);
+			*(u32 *)(d+8) = (w LS 16) | (x RS 16);
+			w = *(const u32 *)(s+14);
+			*(u32 *)(d+12) = (x LS 16) | (w RS 16);
+		}
+		break;
+	case 3:
+		w = *(const u32 *)s;
+		*d++ = *s++;
+		n -= 1;
+		for (; n>=19; s+=16, d+=16, n-=16) {
+			x = *(const u32 *)(s+3);
+			*(u32 *)(d+0) = (w LS 8) | (x RS 24);
+			w = *(const u32 *)(s+7);
+			*(u32 *)(d+4) = (x LS 8) | (w RS 24);
+			x = *(const u32 *)(s+11);
+			*(u32 *)(d+8) = (w LS 8) | (x RS 24);
+			w = *(const u32 *)(s+15);
+			*(u32 *)(d+12) = (x LS 8) | (w RS 24);
+		}
+		break;
+	}
+	if (n&16) {
+		*d++ = *s++; *d++ = *s++; *d++ = *s++; *d++ = *s++;
+		*d++ = *s++; *d++ = *s++; *d++ = *s++; *d++ = *s++;
+		*d++ = *s++; *d++ = *s++; *d++ = *s++; *d++ = *s++;
+		*d++ = *s++; *d++ = *s++; *d++ = *s++; *d++ = *s++;
+	}
+	if (n&8) {
+		*d++ = *s++; *d++ = *s++; *d++ = *s++; *d++ = *s++;
+		*d++ = *s++; *d++ = *s++; *d++ = *s++; *d++ = *s++;
+	}
+	if (n&4) {
+		*d++ = *s++; *d++ = *s++; *d++ = *s++; *d++ = *s++;
+	}
+	if (n&2) {
+		*d++ = *s++; *d++ = *s++;
+	}
+	if (n&1) {
+		*d = *s;
+	}
+	return dest;
+#endif
+
+	for (; n; n--) *d++ = *s++;
+	return dest;
+}
diff --git a/memmove.c b/memmove.c
new file mode 100644
index 0000000..df0e208
--- /dev/null
+++ b/memmove.c
@@ -0,0 +1,50 @@
+#include "musl.h"
+
+/*
+ * Copied from musl libc version 1.2.5 licensed under the MIT license
+ *
+ * Christian Göttsche: Added const qualifiers to retain const correctness.
+ */
+
+#include <string.h>
+#include <stdint.h>
+
+#ifdef __GNUC__
+typedef __attribute__((__may_alias__)) size_t WT;
+#define WS (sizeof(WT))
+#endif
+
+void *musl_memmove(void *dest, const void *src, size_t n)
+{
+	char *d = dest;
+	const char *s = src;
+
+	if (d==s) return d;
+	if ((uintptr_t)s-(uintptr_t)d-n <= -2*n) return memcpy(d, s, n);
+
+	if (d<s) {
+#ifdef __GNUC__
+		if ((uintptr_t)s % WS == (uintptr_t)d % WS) {
+			while ((uintptr_t)d % WS) {
+				if (!n--) return dest;
+				*d++ = *s++;
+			}
+			for (; n>=WS; n-=WS, d+=WS, s+=WS) *(WT *)d = *(const WT *)s;
+		}
+#endif
+		for (; n; n--) *d++ = *s++;
+	} else {
+#ifdef __GNUC__
+		if ((uintptr_t)s % WS == (uintptr_t)d % WS) {
+			while ((uintptr_t)(d+n) % WS) {
+				if (!n--) return dest;
+				d[n] = s[n];
+			}
+			while (n>=WS) n-=WS, *(WT *)(d+n) = *(const WT *)(s+n);
+		}
+#endif
+		while (n) n--, d[n] = s[n];
+	}
+
+	return dest;
+}
diff --git a/memset.c b/memset.c
new file mode 100644
index 0000000..6af5571
--- /dev/null
+++ b/memset.c
@@ -0,0 +1,94 @@
+#include "musl.h"
+
+/* Copied from musl libc version 1.2.5 licensed under the MIT license */
+
+#include <string.h>
+#include <stdint.h>
+
+void *musl_memset(void *dest, int c, size_t n)
+{
+	unsigned char *s = dest;
+	size_t k;
+
+	/* Fill head and tail with minimal branching. Each
+	 * conditional ensures that all the subsequently used
+	 * offsets are well-defined and in the dest region. */
+
+	if (!n) return dest;
+	s[0] = c;
+	s[n-1] = c;
+	if (n <= 2) return dest;
+	s[1] = c;
+	s[2] = c;
+	s[n-2] = c;
+	s[n-3] = c;
+	if (n <= 6) return dest;
+	s[3] = c;
+	s[n-4] = c;
+	if (n <= 8) return dest;
+
+	/* Advance pointer to align it at a 4-byte boundary,
+	 * and truncate n to a multiple of 4. The previous code
+	 * already took care of any head/tail that get cut off
+	 * by the alignment. */
+
+	k = -(uintptr_t)s & 3;
+	s += k;
+	n -= k;
+	n &= -4;
+
+#ifdef __GNUC__
+	typedef uint32_t __attribute__((__may_alias__)) u32;
+	typedef uint64_t __attribute__((__may_alias__)) u64;
+
+	u32 c32 = ((u32)-1)/255 * (unsigned char)c;
+
+	/* In preparation to copy 32 bytes at a time, aligned on
+	 * an 8-byte bounary, fill head/tail up to 28 bytes each.
+	 * As in the initial byte-based head/tail fill, each
+	 * conditional below ensures that the subsequent offsets
+	 * are valid (e.g. !(n<=24) implies n>=28). */
+
+	*(u32 *)(s+0) = c32;
+	*(u32 *)(s+n-4) = c32;
+	if (n <= 8) return dest;
+	*(u32 *)(s+4) = c32;
+	*(u32 *)(s+8) = c32;
+	*(u32 *)(s+n-12) = c32;
+	*(u32 *)(s+n-8) = c32;
+	if (n <= 24) return dest;
+	*(u32 *)(s+12) = c32;
+	*(u32 *)(s+16) = c32;
+	*(u32 *)(s+20) = c32;
+	*(u32 *)(s+24) = c32;
+	*(u32 *)(s+n-28) = c32;
+	*(u32 *)(s+n-24) = c32;
+	*(u32 *)(s+n-20) = c32;
+	*(u32 *)(s+n-16) = c32;
+
+	/* Align to a multiple of 8 so we can fill 64 bits at a time,
+	 * and avoid writing the same bytes twice as much as is
+	 * practical without introducing additional branching. */
+
+	k = 24 + ((uintptr_t)s & 4);
+	s += k;
+	n -= k;
+
+	/* If this loop is reached, 28 tail bytes have already been
+	 * filled, so any remainder when n drops below 32 can be
+	 * safely ignored. */
+
+	u64 c64 = c32 | ((u64)c32 << 32);
+	for (; n >= 32; n-=32, s+=32) {
+		*(u64 *)(s+0) = c64;
+		*(u64 *)(s+8) = c64;
+		*(u64 *)(s+16) = c64;
+		*(u64 *)(s+24) = c64;
+	}
+#else
+	/* Pure C fallback with no aliasing violations. */
+	for (; n; n--, s++) *s = c;
+#endif
+
+	return dest;
+}
diff --git a/musl.h b/musl.h
new file mode 100644
index 0000000..f58169d
--- /dev/null
+++ b/musl.h
@@ -0,0 +1,10 @@
+#pragma once
+
+#include <stddef.h>
+
+void *musl_memcpy(void *dst, const void *src, size_t len);
+void *musl_memmove(void *dst, const void *src, size_t len);
+void *musl_memset(void *dst, int value, size_t len);
+wchar_t *musl_wmemcpy(wchar_t *dst, const wchar_t *src, size_t len);
+wchar_t *musl_wmemmove(wchar_t *dst, const wchar_t *src, size_t len);
+wchar_t *musl_wmemset(wchar_t *dst, wchar_t value, size_t len);
diff --git a/test/.gitignore b/test/.gitignore
index d37a6a7..fa4fa1e 100644
--- a/test/.gitignore
+++ b/test/.gitignore
@@ -41,4 +41,15 @@ overflow_small_8_byte
 uninitialized_read_large
 uninitialized_read_small
 realloc_init
+memcpy_buffer_overflow
+memcpy_read_overflow
+memcpy_valid_same
+memcpy_valid_mismatched
+memmove_buffer_overflow
+memmove_read_overflow
+memmove_valid_same
+memmove_valid_mismatched
+memset_buffer_overflow
+memset_valid_same
+memset_valid_mismatched
 __pycache__/
diff --git a/test/Makefile b/test/Makefile
index 0eb3921..cd9e664 100644
--- a/test/Makefile
+++ b/test/Makefile
@@ -67,7 +67,18 @@ EXECUTABLES := \
     invalid_malloc_object_size_small \
     invalid_malloc_object_size_small_quarantine \
     impossibly_large_malloc \
-    realloc_init
+    realloc_init \
+    memcpy_buffer_overflow \
+    memcpy_read_overflow \
+    memcpy_valid_same \
+    memcpy_valid_mismatched \
+    memmove_buffer_overflow \
+    memmove_read_overflow \
+    memmove_valid_same \
+    memmove_valid_mismatched \
+    memset_buffer_overflow \
+    memset_valid_same \
+    memset_valid_mismatched
 
 all: $(EXECUTABLES)
 
diff --git a/test/memcpy_buffer_overflow.c b/test/memcpy_buffer_overflow.c
new file mode 100644
index 0000000..16cab77
--- /dev/null
+++ b/test/memcpy_buffer_overflow.c
@@ -0,0 +1,15 @@
+#include <stdlib.h>
+#include <string.h>
+
+#include "test_util.h"
+
+OPTNONE int main(void) {
+    char *firstbuffer = malloc(16);
+    char *secondbuffer = malloc(32);
+    if (!firstbuffer && !secondbuffer) {
+        return 1;
+    }
+    memset(secondbuffer, 'a', 32);
+    memcpy(firstbuffer, secondbuffer, 32);
+    return 1;
+}
diff --git a/test/memcpy_read_overflow.c b/test/memcpy_read_overflow.c
new file mode 100644
index 0000000..cf51498
--- /dev/null
+++ b/test/memcpy_read_overflow.c
@@ -0,0 +1,15 @@
+#include <stdlib.h>
+#include <string.h>
+
+#include "test_util.h"
+
+OPTNONE int main(void) {
+    char *firstbuffer = malloc(32);
+    char *secondbuffer = malloc(16);
+    if (!firstbuffer && !secondbuffer) {
+        return 1;
+    }
+    memset(secondbuffer, 'a', 16);
+    memcpy(firstbuffer, secondbuffer, 32);
+    return 1;
+}
diff --git a/test/memcpy_valid_mismatched.c b/test/memcpy_valid_mismatched.c
new file mode 100644
index 0000000..81d718e
--- /dev/null
+++ b/test/memcpy_valid_mismatched.c
@@ -0,0 +1,15 @@
+#include <stdlib.h>
+#include <string.h>
+
+#include "test_util.h"
+
+OPTNONE int main(void) {
+    char *firstbuffer = malloc(32);
+    char *secondbuffer = malloc(16);
+    if (!firstbuffer && !secondbuffer) {
+        return 1;
+    }
+    memset(secondbuffer, 'a', 16);
+    memcpy(firstbuffer, secondbuffer, 16);
+    return 0;
+}
diff --git a/test/memcpy_valid_same.c b/test/memcpy_valid_same.c
new file mode 100644
index 0000000..1b408f0
--- /dev/null
+++ b/test/memcpy_valid_same.c
@@ -0,0 +1,15 @@
+#include <stdlib.h>
+#include <string.h>
+
+#include "test_util.h"
+
+OPTNONE int main(void) {
+    char *firstbuffer = malloc(16);
+    char *secondbuffer = malloc(16);
+    if (!firstbuffer && !secondbuffer) {
+        return 1;
+    }
+    memset(secondbuffer, 'a', 16);
+    memcpy(firstbuffer, secondbuffer, 16);
+    return 0;
+}
diff --git a/test/memmove_buffer_overflow.c b/test/memmove_buffer_overflow.c
new file mode 100644
index 0000000..c83bf97
--- /dev/null
+++ b/test/memmove_buffer_overflow.c
@@ -0,0 +1,15 @@
+#include <stdlib.h>
+#include <string.h>
+
+#include "test_util.h"
+
+OPTNONE int main(void) {
+    char *firstbuffer = malloc(16);
+    char *secondbuffer = malloc(32);
+    if (!firstbuffer && !secondbuffer) {
+        return 1;
+    }
+    memset(secondbuffer, 'a', 32);
+    memmove(firstbuffer, secondbuffer, 32);
+    return 1;
+}
diff --git a/test/memmove_read_overflow.c b/test/memmove_read_overflow.c
new file mode 100644
index 0000000..73e4509
--- /dev/null
+++ b/test/memmove_read_overflow.c
@@ -0,0 +1,15 @@
+#include <stdlib.h>
+#include <string.h>
+
+#include "test_util.h"
+
+OPTNONE int main(void) {
+    char *firstbuffer = malloc(32);
+    char *secondbuffer = malloc(16);
+    if (!firstbuffer && !secondbuffer) {
+        return 1;
+    }
+    memset(secondbuffer, 'a', 16);
+    memmove(firstbuffer, secondbuffer, 32);
+    return 1;
+}
diff --git a/test/memmove_valid_mismatched.c b/test/memmove_valid_mismatched.c
new file mode 100644
index 0000000..5dd1bde
--- /dev/null
+++ b/test/memmove_valid_mismatched.c
@@ -0,0 +1,15 @@
+#include <stdlib.h>
+#include <string.h>
+
+#include "test_util.h"
+
+OPTNONE int main(void) {
+    char *firstbuffer = malloc(32);
+    char *secondbuffer = malloc(16);
+    if (!firstbuffer && !secondbuffer) {
+        return 1;
+    }
+    memset(secondbuffer, 'a', 16);
+    memmove(firstbuffer, secondbuffer, 16);
+    return 0;
+}
diff --git a/test/memmove_valid_same.c b/test/memmove_valid_same.c
new file mode 100644
index 0000000..2593abc
--- /dev/null
+++ b/test/memmove_valid_same.c
@@ -0,0 +1,15 @@
+#include <stdlib.h>
+#include <string.h>
+
+#include "test_util.h"
+
+OPTNONE int main(void) {
+    char *firstbuffer = malloc(16);
+    char *secondbuffer = malloc(16);
+    if (!firstbuffer && !secondbuffer) {
+        return 1;
+    }
+    memset(secondbuffer, 'a', 16);
+    memmove(firstbuffer, secondbuffer, 16);
+    return 0;
+}
diff --git a/test/memset_buffer_overflow.c b/test/memset_buffer_overflow.c
new file mode 100644
index 0000000..8f9e989
--- /dev/null
+++ b/test/memset_buffer_overflow.c
@@ -0,0 +1,13 @@
+#include <stdlib.h>
+#include <string.h>
+
+#include "test_util.h"
+
+OPTNONE int main(void) {
+    char *buffer = malloc(16);
+    if (!buffer) {
+        return 1;
+    }
+    memset(buffer, 'a', 32);
+    return 1;
+}
diff --git a/test/memset_valid_mismatched.c b/test/memset_valid_mismatched.c
new file mode 100644
index 0000000..f57fef6
--- /dev/null
+++ b/test/memset_valid_mismatched.c
@@ -0,0 +1,13 @@
+#include <stdlib.h>
+#include <string.h>
+
+#include "test_util.h"
+
+OPTNONE int main(void) {
+    char *buffer = malloc(32);
+    if (!buffer) {
+        return 1;
+    }
+    memset(buffer, 'a', 16);
+    return 0;
+}
diff --git a/test/memset_valid_same.c b/test/memset_valid_same.c
new file mode 100644
index 0000000..824c18f
--- /dev/null
+++ b/test/memset_valid_same.c
@@ -0,0 +1,13 @@
+#include <stdlib.h>
+#include <string.h>
+
+#include "test_util.h"
+
+OPTNONE int main(void) {
+    char *buffer = malloc(16);
+    if (!buffer) {
+        return 1;
+    }
+    memset(buffer, 'a', 16);
+    return 0;
+}
diff --git a/test/test_smc.py b/test/test_smc.py
index 170278e..daa2934 100644
--- a/test/test_smc.py
+++ b/test/test_smc.py
@@ -238,5 +238,70 @@ class TestSimpleMemoryCorruption(unittest.TestCase):
             "realloc_init")
         self.assertEqual(returncode, 0)
 
+    def test_memcpy_buffer_overflow(self):
+        _stdout, stderr, returncode = self.run_test(
+            "memcpy_buffer_overflow")
+        self.assertEqual(returncode, -6)
+        self.assertEqual(stderr.decode(
+            "utf-8"), "fatal allocator error: memcpy buffer overflow\n")
+
+    def test_memcpy_read_overflow(self):
+        _stdout, stderr, returncode = self.run_test(
+            "memcpy_read_overflow")
+        self.assertEqual(returncode, -6)
+        self.assertEqual(stderr.decode(
+            "utf-8"), "fatal allocator error: memcpy read overflow\n")
+
+    def test_memcpy_valid_same(self):
+        _stdout, _stderr, returncode = self.run_test(
+            "memcpy_valid_same")
+        self.assertEqual(returncode, 0)
+
+    def test_memcpy_valid_mismatched(self):
+        _stdout, _stderr, returncode = self.run_test(
+            "memcpy_valid_mismatched")
+        self.assertEqual(returncode, 0)
+
+    def test_memmove_buffer_overflow(self):
+        _stdout, stderr, returncode = self.run_test(
+            "memmove_buffer_overflow")
+        self.assertEqual(returncode, -6)
+        self.assertEqual(stderr.decode(
+            "utf-8"), "fatal allocator error: memmove buffer overflow\n")
+
+    def test_memmove_read_overflow(self):
+        _stdout, stderr, returncode = self.run_test(
+            "memmove_read_overflow")
+        self.assertEqual(returncode, -6)
+        self.assertEqual(stderr.decode(
+            "utf-8"), "fatal allocator error: memmove read overflow\n")
+
+    def test_memmove_valid_same(self):
+        _stdout, _stderr, returncode = self.run_test(
+            "memmove_valid_same")
+        self.assertEqual(returncode, 0)
+
+    def test_memmove_valid_mismatched(self):
+        _stdout, _stderr, returncode = self.run_test(
+            "memmove_valid_mismatched")
+        self.assertEqual(returncode, 0)
+
+    def test_memset_buffer_overflow(self):
+        _stdout, stderr, returncode = self.run_test(
+            "memset_buffer_overflow")
+        self.assertEqual(returncode, -6)
+        self.assertEqual(stderr.decode(
+            "utf-8"), "fatal allocator error: memset buffer overflow\n")
+
+    def test_memset_valid_same(self):
+        _stdout, _stderr, returncode = self.run_test(
+            "memset_valid_same")
+        self.assertEqual(returncode, 0)
+
+    def test_memset_valid_mismatched(self):
+        _stdout, _stderr, returncode = self.run_test(
+            "memset_valid_mismatched")
+        self.assertEqual(returncode, 0)
+
 if __name__ == '__main__':
     unittest.main()
diff --git a/wmemcpy.c b/wmemcpy.c
new file mode 100644
index 0000000..cc694cd
--- /dev/null
+++ b/wmemcpy.c
@@ -0,0 +1,12 @@
+#include "musl.h"
+
+/* Copied from musl libc version 1.2.5 licensed under the MIT license */
+
+#include <wchar.h>
+
+wchar_t *musl_wmemcpy(wchar_t *restrict d, const wchar_t *restrict s, size_t n)
+{
+	wchar_t *a = d;
+	while (n--) *d++ = *s++;
+	return a;
+}
diff --git a/wmemmove.c b/wmemmove.c
new file mode 100644
index 0000000..10d6bf0
--- /dev/null
+++ b/wmemmove.c
@@ -0,0 +1,17 @@
+#include "musl.h"
+
+/* Copied from musl libc version 1.2.5 licensed under the MIT license */
+
+#include <wchar.h>
+#include <stdint.h>
+
+wchar_t *musl_wmemmove(wchar_t *d, const wchar_t *s, size_t n)
+{
+	wchar_t *d0 = d;
+	if (d == s) return d;
+	if ((uintptr_t)d-(uintptr_t)s < n * sizeof *d)
+		while (n--) d[n] = s[n];
+	else
+		while (n--) *d++ = *s++;
+	return d0;
+}
diff --git a/wmemset.c b/wmemset.c
new file mode 100644
index 0000000..d8bcbb5
--- /dev/null
+++ b/wmemset.c
@@ -0,0 +1,12 @@
+#include "musl.h"
+
+/* Copied from musl libc version 1.2.5 licensed under the MIT license */
+
+#include <wchar.h>
+
+wchar_t *musl_wmemset(wchar_t *d, wchar_t c, size_t n)
+{
+	wchar_t *ret = d;
+	while (n--) *d++ = c;
+	return ret;
+}
-- 
2.49.0

cgzones avatar Mar 26 '25 09:03 cgzones

@cgzones thanks!

why go back to using self functions instead of host?

also I did have a revision somewhere above that had musl's assembly functions for speed

and how did you prevent the compiler from optimizing out the functions?

SkewedZeppelin avatar Mar 26 '25 09:03 SkewedZeppelin

why go back to using self functions instead of host?

i can't follow, can you explain?

also I did have a revision somewhere above that had musl's assembly functions for speed

i wanted to remain architecture agnostic

and how did you prevent the compiler from optimizing out the functions?

simple programs like this one will get optimized and succeed:

#include <string.h>
#include <stdio.h>
#include <stdlib.h>

int main() {
        char *src = malloc(1024);
        char *dst = malloc(256);
        memset(src, 's', 1024);
        memset(dst, 'd', 256);

        memcpy(dst, src, 1024);

        printf("%#x %#x %#x  ['s'=%#x, 'd'=%#x]\n", dst[24], dst[255], dst[511], 's', 'd');

        printf("success?\n");

        return EXIT_SUCCESS;
}

but when using non compile-time constants it seems to work:

#include <string.h>
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[]) {
        if (argc != 2) {
                fprintf(stderr, "usage: %s <size>\n", argv[0]);
                return EXIT_FAILURE;
        }

        int size = atoi(argv[1]);

        printf("using size %d...\n", size);

        char *src = malloc(1024);
        char *dst = malloc(256);
        memset(src, 's', 1024);
        memset(dst, 'd', 256);

        memcpy(dst, src, size);

        printf("%#x %#x %#x  ['s'=%#x, 'd'=%#x]\n", dst[24], dst[255], dst[511], 's', 'd');

        printf("success?\n");

        return EXIT_SUCCESS;
}

cgzones avatar Mar 26 '25 10:03 cgzones

i can't follow, can you explain?

as in not using dlsym to get the real functions

agnostic

should I add them back?

simple programs like this

I meant I don't understand why the compiler isn't optimizing out memcpy like I mentioned here: https://github.com/GrapheneOS/hardened_malloc/pull/252#issuecomment-2746213524

SkewedZeppelin avatar Mar 26 '25 10:03 SkewedZeppelin

i can't follow, can you explain?

as in not using dlsym to get the real functions

i didn't seem to work with rustc and vscode, probably due some bundled libc?

cgzones avatar Mar 26 '25 10:03 cgzones

@thestinger any insight as to why malloc_object_size() is almost always returning SIZE_MAX when compiled with clang?

SkewedZeppelin avatar Mar 27 '25 13:03 SkewedZeppelin

@SkewedZeppelin It shouldn't matter since it's done dynamically. Can you give an example?

thestinger avatar Mar 27 '25 16:03 thestinger

@cgzones done, thanks! verified with the newly added test cases for them too.

edit: testing programs again:

org.openrgb.OpenRGB.desktop[4758]: fatal allocator error: wmemcpy overlap
audacity[4549]: fatal allocator error: wmemcpy overlap

broken check? or something doing a silly? edit 2: valgrind has no support for detecting wmemcpy overlap, only memcpy overlap. neither of them make direct calls to wmemcpy, no idea what library is doing so.

SkewedZeppelin avatar Mar 27 '25 19:03 SkewedZeppelin

org.openrgb.OpenRGB.desktop[4758]: fatal allocator error: wmemcpy overlap
audacity[4549]: fatal allocator error: wmemcpy overlap

broken check? or something doing a silly?

https://sources.debian.org/src/wxwidgets3.2/3.2.6%2Bdfsg-2/include/wx/string.h/#L1180

// copy ctor
wxString(const wxString& stringSrc) : m_impl(stringSrc.m_impl) { }

Maybe this copy constructor should have a self-copy guard?

backtrace
#4  0x00007ffff7fb0109 in fatal_error (s=s@entry=0x7ffff7fb71e1 "wmemcpy overlap") at /var/home/christian/Coding/workspaces/hardened_malloc/util.c:42
        prefix = 0x7ffff7fb7000 "fatal allocator error: "
#5  0x00007ffff7fb0b17 in wmemcpy(wchar_t * __restrict__, const wchar_t * __restrict__, size_t) (dst=0x73d76f3061c0 L"", src=src@entry=0x73d76f306240 L"wxConfigBase", len=len@entry=12)
    at /var/home/christian/Coding/workspaces/hardened_malloc/h_malloc.c:1928
        lenAdj = 48
#6  0x00007ffff67bcc9b in wmemcpy (__s1=<optimized out>, __s2=0x73d76f306240 L"wxConfigBase", __n=12) at /usr/include/x86_64-linux-gnu/bits/wchar2.h:30
#7  std::char_traits<wchar_t>::copy (__s1=<optimized out>, __s2=0x73d76f306240 L"wxConfigBase", __n=12) at /usr/include/c++/14/bits/char_traits.h:558
#8  std::__cxx11::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >::_S_copy (__d=<optimized out>, __s=0x73d76f306240 L"wxConfigBase", __n=12)
    at /usr/include/c++/14/bits/basic_string.h:435
#9  std::__cxx11::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >::_S_copy_chars (__p=<optimized out>, __k1=0x73d76f306240 L"wxConfigBase", __k2=<optimized out>)
    at /usr/include/c++/14/bits/basic_string.h:483
#10 std::__cxx11::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >::_M_construct<wchar_t*> (this=0x73d76f3060c0, __beg=0x73d76f306240 L"wxConfigBase", __end=<optimized out>)
    at /usr/include/c++/14/bits/basic_string.tcc:247
        __dnew = 12
        __guard = {_M_guarded = <optimized out>}
        __dnew = <optimized out>
        __guard = {_M_guarded = <optimized out>}
#11 std::__cxx11::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >::basic_string (this=0x73d76f3060c0, __str=<optimized out>) at /usr/include/c++/14/bits/basic_string.h:556
#12 wxString::wxString (this=0x73d76f3060c0, stringSrc=<optimized out>) at ./include/wx/string.h:1180
#13 wxHashTableBase_Node::wxHashTableBase_Node (this=0x73c607192f00, key=<optimized out>, value=<optimized out>, table=<optimized out>) at ./src/common/hash.cpp:39
#14 0x00007ffff67bcf7a in wxHashTableBase::DoPut (this=this@entry=0x73e7450d7730, key=..., hash=<optimized out>, data=data@entry=0x7ffff6963420 <wxConfigBase::ms_classInfo>) at ./src/common/hash.cpp:172
        bucket = 216
        node = <optimized out>
#15 0x00007ffff67d793c in wxHashTable::Put (this=0x73e7450d7730, value=..., object=0x7ffff6963420 <wxConfigBase::ms_classInfo>) at ./include/wx/hash.h:206
#16 wxClassInfo::Register (this=this@entry=0x7ffff6963420 <wxConfigBase::ms_classInfo>) at ./src/common/object.cpp:241
        entry = 1
        classTable = 0x73e7450d7730
#17 0x00007ffff672dd71 in wxClassInfo::wxClassInfo
    (this=0x7ffff6963420 <wxConfigBase::ms_classInfo>, className=0x7ffff68dbd00 L"wxConfigBase", baseInfo1=<optimized out>, baseInfo2=0x0, size=128, ctor=0x0) at ./include/wx/rtti.h:59
#18 __static_initialization_and_destruction_0 () at ./src/common/config.cpp:71
#19 _GLOBAL__sub_I_config.cpp(void) () at ./src/common/config.cpp:600
--Type <RET> for more, q to quit, c to continue without paging--c
#20 0x00007ffff7fcbf4e in call_init (l=<optimized out>, argc=1, argv=0x7fffffffd9c8, env=0x7fffffffd9d8) at ./elf/dl-init.c:74
        j = 0
        jm = <optimized out>
        addrs = <optimized out>
        init_array = <optimized out>
        init_array = <optimized out>
        j = <optimized out>
        jm = <optimized out>
        addrs = <optimized out>
#21 call_init (l=<optimized out>, argc=1, argv=0x7fffffffd9c8, env=0x7fffffffd9d8) at ./elf/dl-init.c:26
        init_array = <optimized out>
        j = <optimized out>
        jm = <optimized out>
        addrs = <optimized out>
#22 0x00007ffff7fcc01c in _dl_init (main_map=0x7ffff7ffe310, argc=1, argv=0x7fffffffd9c8, env=0x7fffffffd9d8) at ./elf/dl-init.c:121
        preinit_array = <optimized out>
        preinit_array_size = <optimized out>
        i = <optimized out>
#23 0x00007ffff7fe35f0 in _dl_start_user () at /lib64/ld-linux-x86-64.so.2

edit: hmm, the check should not trigger for these addresses dst=0x73d76f3061c0 L"", src=src@entry=0x73d76f306240 and a length of 48 (4 * 12)

cgzones avatar Mar 27 '25 20:03 cgzones

diff --git a/h_malloc.c b/h_malloc.c
index 9db6dcc..1ea6224 100644
--- a/h_malloc.c
+++ b/h_malloc.c
@@ -1884,7 +1884,7 @@ EXPORT void *memcpy(void *restrict dst, const void *restrict src, size_t len) {
     if (unlikely(dst == src || len == 0)) {
         return dst;
     }
-    if (unlikely(dst < src + len && dst + len > src)) {
+    if (unlikely(dst < (src + len) && (dst + len) > src)) {
         fatal_error("memcpy overlap");
     }
     if (unlikely(len > malloc_object_size(src))) {
@@ -1924,8 +1924,8 @@ EXPORT wchar_t *wmemcpy(wchar_t *restrict dst, const wchar_t *restrict src, size
         return dst;
     }
     size_t lenAdj = len * sizeof(wchar_t);
-    if (unlikely(dst < src + lenAdj && dst + lenAdj > src)) {
+    if (unlikely(dst < (src + len) && (dst + len) > src)) {
         fatal_error("wmemcpy overlap");
     }
     if (unlikely(lenAdj > malloc_object_size(src))) {
         fatal_error("wmemcpy read overflow");

Those are not void pointers but wchar_t ones so one must not add bytes but number-of-items

cgzones avatar Mar 27 '25 20:03 cgzones

Another function to cover could be memccpy(3):

patch
---
 Makefile                        |  4 +++-
 h_malloc.c                      | 16 ++++++++++++++
 include/h_malloc.h              |  1 +
 memccpy.c                       | 38 +++++++++++++++++++++++++++++++++
 musl.h                          |  1 +
 test/.gitignore                 |  4 ++++
 test/Makefile                   |  4 ++++
 test/memccpy_buffer_overflow.c  | 15 +++++++++++++
 test/memccpy_read_overflow.c    | 15 +++++++++++++
 test/memccpy_valid_mismatched.c | 15 +++++++++++++
 test/memccpy_valid_same.c       | 15 +++++++++++++
 test/test_smc.py                | 24 +++++++++++++++++++++
 12 files changed, 151 insertions(+), 1 deletion(-)
 create mode 100644 memccpy.c
 create mode 100644 test/memccpy_buffer_overflow.c
 create mode 100644 test/memccpy_read_overflow.c
 create mode 100644 test/memccpy_valid_mismatched.c
 create mode 100644 test/memccpy_valid_same.c

diff --git a/Makefile b/Makefile
index e3eee31..d18d0b3 100644
--- a/Makefile
+++ b/Makefile
@@ -41,7 +41,7 @@ LDFLAGS := $(LDFLAGS) -Wl,-O1,--as-needed,-z,defs,-z,relro,-z,now,-z,nodlopen,-z
 
 SOURCES := chacha.c h_malloc.c memory.c pages.c random.c util.c
 ifeq ($(CONFIG_BLOCK_OPS_CHECK_SIZE),true)
-    SOURCES += memcpy.c memmove.c memset.c wmemset.c
+    SOURCES += memcpy.c memccpy.c memmove.c memset.c wmemset.c
     BOSC_EXTRAS := musl.h
 endif
 OBJECTS := $(SOURCES:.c=.o)
@@ -142,6 +142,8 @@ $(OUT)/util.o: util.c util.h $(CONFIG_FILE) | $(OUT)
 
 $(OUT)/memcpy.o: memcpy.c musl.h $(CONFIG_FILE) | $(OUT)
 	$(COMPILE.c) -Wno-cast-align $(OUTPUT_OPTION) $<
+$(OUT)/memccpy.o: memccpy.c musl.h $(CONFIG_FILE) | $(OUT)
+	$(COMPILE.c) -Wno-cast-align $(OUTPUT_OPTION) $<
 $(OUT)/memmove.o: memmove.c musl.h $(CONFIG_FILE) | $(OUT)
 	$(COMPILE.c) -Wno-cast-align $(OUTPUT_OPTION) $<
 $(OUT)/memset.o: memset.c musl.h $(CONFIG_FILE) | $(OUT)
diff --git a/h_malloc.c b/h_malloc.c
index 5b7acb2..c99c0ca 100644
--- a/h_malloc.c
+++ b/h_malloc.c
@@ -1901,6 +1901,22 @@ EXPORT void *memcpy(void *restrict dst, const void *restrict src, size_t len) {
     return musl_memcpy(dst, src, len);
 }
 
+EXPORT void *memccpy(void *restrict dst, const void *restrict src, int value, size_t len) {
+    if (unlikely(dst == src || len == 0)) {
+        return dst;
+    }
+    if (unlikely(dst < (src + len) && (dst + len) > src)) {
+        fatal_error("memccpy overlap");
+    }
+    if (unlikely(len > malloc_object_size(src))) {
+        fatal_error("memccpy read overflow");
+    }
+    if (unlikely(len > malloc_object_size(dst))) {
+        fatal_error("memccpy buffer overflow");
+    }
+    return musl_memccpy(dst, src, value, len);
+}
+
 EXPORT void *memmove(void *dst, const void *src, size_t len) {
     if (unlikely(dst == src || len == 0)) {
         return dst;
diff --git a/include/h_malloc.h b/include/h_malloc.h
index ceca456..7974ff5 100644
--- a/include/h_malloc.h
+++ b/include/h_malloc.h
@@ -57,6 +57,7 @@ void h_free(void *ptr);
 
 #if CONFIG_BLOCK_OPS_CHECK_SIZE && !defined(HAS_ARM_MTE)
 void *memcpy(void *dst, const void *src, size_t len);
+void *memccpy(void *dst, const void *src, int value, size_t len);
 void *memmove(void *dst, const void *src, size_t len);
 void *memset(void *dst, int value, size_t len);
 wchar_t *wmemcpy(wchar_t *dst, const wchar_t *src, size_t len);
diff --git a/memccpy.c b/memccpy.c
new file mode 100644
index 0000000..e01a9eb
--- /dev/null
+++ b/memccpy.c
@@ -0,0 +1,38 @@
+#include "musl.h"
+
+/* Copied from musl libc version 1.2.5 licensed under the MIT license */
+
+#include <string.h>
+#include <stdint.h>
+#include <limits.h>
+
+#define ALIGN (sizeof(size_t)-1)
+#define ONES ((size_t)-1/UCHAR_MAX)
+#define HIGHS (ONES * (UCHAR_MAX/2+1))
+#define HASZERO(x) (((x)-ONES) & ~(x) & HIGHS)
+
+void *musl_memccpy(void *restrict dest, const void *restrict src, int c, size_t n)
+{
+	unsigned char *d = dest;
+	const unsigned char *s = src;
+
+	c = (unsigned char)c;
+#ifdef __GNUC__
+	typedef size_t __attribute__((__may_alias__)) word;
+	word *wd;
+	const word *ws;
+	if (((uintptr_t)s & ALIGN) == ((uintptr_t)d & ALIGN)) {
+		for (; ((uintptr_t)s & ALIGN) && n && (*d=*s)!=c; n--, s++, d++);
+		if ((uintptr_t)s & ALIGN) goto tail;
+		size_t k = ONES * c;
+		wd=(void *)d; ws=(const void *)s;
+		for (; n>=sizeof(size_t) && !HASZERO(*ws^k);
+		       n-=sizeof(size_t), ws++, wd++) *wd = *ws;
+		d=(void *)wd; s=(const void *)ws;
+	}
+#endif
+	for (; n && (*d=*s)!=c; n--, s++, d++);
+tail:
+	if (n) return d+1;
+	return 0;
+}
diff --git a/musl.h b/musl.h
index ff07821..4349622 100644
--- a/musl.h
+++ b/musl.h
@@ -3,6 +3,7 @@
 #include <stddef.h>
 
 void *musl_memcpy(void *dst, const void *src, size_t len);
+void *musl_memccpy(void *restrict dest, const void *restrict src, int c, size_t n);
 void *musl_memmove(void *dst, const void *src, size_t len);
 void *musl_memset(void *dst, int value, size_t len);
 wchar_t *musl_wmemset(wchar_t *dst, wchar_t value, size_t len);
diff --git a/test/.gitignore b/test/.gitignore
index cf4e256..45cabd2 100644
--- a/test/.gitignore
+++ b/test/.gitignore
@@ -45,6 +45,10 @@ memcpy_buffer_overflow
 memcpy_read_overflow
 memcpy_valid_same
 memcpy_valid_mismatched
+memccpy_buffer_overflow
+memccpy_read_overflow
+memccpy_valid_same
+memccpy_valid_mismatched
 memmove_buffer_overflow
 memmove_read_overflow
 memmove_valid_same
diff --git a/test/Makefile b/test/Makefile
index 8d29a9c..76f86f0 100644
--- a/test/Makefile
+++ b/test/Makefile
@@ -72,6 +72,10 @@ EXECUTABLES := \
     memcpy_read_overflow \
     memcpy_valid_same \
     memcpy_valid_mismatched \
+    memccpy_buffer_overflow \
+    memccpy_read_overflow \
+    memccpy_valid_same \
+    memccpy_valid_mismatched \
     memmove_buffer_overflow \
     memmove_read_overflow \
     memmove_valid_same \
diff --git a/test/memccpy_buffer_overflow.c b/test/memccpy_buffer_overflow.c
new file mode 100644
index 0000000..ca0c5d1
--- /dev/null
+++ b/test/memccpy_buffer_overflow.c
@@ -0,0 +1,15 @@
+#include <stdlib.h>
+#include <string.h>
+
+#include "test_util.h"
+
+OPTNONE int main(void) {
+    char *firstbuffer = malloc(16);
+    char *secondbuffer = malloc(32);
+    if (!firstbuffer && !secondbuffer) {
+        return 1;
+    }
+    memset(secondbuffer, 'a', 32);
+    memccpy(firstbuffer, secondbuffer, 'b', 32);
+    return 1;
+}
diff --git a/test/memccpy_read_overflow.c b/test/memccpy_read_overflow.c
new file mode 100644
index 0000000..3b15f53
--- /dev/null
+++ b/test/memccpy_read_overflow.c
@@ -0,0 +1,15 @@
+#include <stdlib.h>
+#include <string.h>
+
+#include "test_util.h"
+
+OPTNONE int main(void) {
+    char *firstbuffer = malloc(32);
+    char *secondbuffer = malloc(16);
+    if (!firstbuffer && !secondbuffer) {
+        return 1;
+    }
+    memset(secondbuffer, 'a', 16);
+    memccpy(firstbuffer, secondbuffer, 'b', 32);
+    return 1;
+}
diff --git a/test/memccpy_valid_mismatched.c b/test/memccpy_valid_mismatched.c
new file mode 100644
index 0000000..b5434f7
--- /dev/null
+++ b/test/memccpy_valid_mismatched.c
@@ -0,0 +1,15 @@
+#include <stdlib.h>
+#include <string.h>
+
+#include "test_util.h"
+
+OPTNONE int main(void) {
+    char *firstbuffer = malloc(32);
+    char *secondbuffer = malloc(16);
+    if (!firstbuffer && !secondbuffer) {
+        return 1;
+    }
+    memset(secondbuffer, 'a', 16);
+    memccpy(firstbuffer, secondbuffer, 'b', 16);
+    return 0;
+}
diff --git a/test/memccpy_valid_same.c b/test/memccpy_valid_same.c
new file mode 100644
index 0000000..a9ba59b
--- /dev/null
+++ b/test/memccpy_valid_same.c
@@ -0,0 +1,15 @@
+#include <stdlib.h>
+#include <string.h>
+
+#include "test_util.h"
+
+OPTNONE int main(void) {
+    char *firstbuffer = malloc(16);
+    char *secondbuffer = malloc(16);
+    if (!firstbuffer && !secondbuffer) {
+        return 1;
+    }
+    memset(secondbuffer, 'a', 16);
+    memccpy(firstbuffer, secondbuffer, 'b', 16);
+    return 0;
+}
diff --git a/test/test_smc.py b/test/test_smc.py
index 3a4c6f8..13a4d8d 100644
--- a/test/test_smc.py
+++ b/test/test_smc.py
@@ -262,6 +262,30 @@ class TestSimpleMemoryCorruption(unittest.TestCase):
             "memcpy_valid_mismatched")
         self.assertEqual(returncode, 0)
 
+    def test_memccpy_buffer_overflow(self):
+        _stdout, stderr, returncode = self.run_test(
+            "memccpy_buffer_overflow")
+        self.assertEqual(returncode, -6)
+        self.assertEqual(stderr.decode(
+            "utf-8"), "fatal allocator error: memccpy buffer overflow\n")
+
+    def test_memccpy_read_overflow(self):
+        _stdout, stderr, returncode = self.run_test(
+            "memccpy_read_overflow")
+        self.assertEqual(returncode, -6)
+        self.assertEqual(stderr.decode(
+            "utf-8"), "fatal allocator error: memccpy read overflow\n")
+
+    def test_memccpy_valid_same(self):
+        _stdout, _stderr, returncode = self.run_test(
+            "memccpy_valid_same")
+        self.assertEqual(returncode, 0)
+
+    def test_memccpy_valid_mismatched(self):
+        _stdout, _stderr, returncode = self.run_test(
+            "memccpy_valid_mismatched")
+        self.assertEqual(returncode, 0)
+
     def test_memmove_buffer_overflow(self):
         _stdout, stderr, returncode = self.run_test(
             "memmove_buffer_overflow")
-- 
2.49.0

Also there are three calls to memcpy() in ./random.c which might need to be converted to h_memcpy_internal().

cgzones avatar Mar 30 '25 12:03 cgzones

I'm seeing a memccpy read overflow on both lvm immediately on start and gdm when trying to login

here is lvm, it looks correct

src size: 250, length: 256
fatal allocator error: memccpy read overflow

#4  0x00007ffff7f97113 in fatal_error (s=s@entry=0x7ffff7f9e629 "memccpy read overflow")
    at /home/tad/Development/CPP/rpm-hardened_malloc/hardened_malloc-2025012700-build/hardened_malloc/util.c:42
        prefix = 0x7ffff7f9e420 "fatal allocator error: "
#5  0x00007ffff7f9c729 in memccpy(void * __restrict__, const void * __restrict__, int, size_t) (
    dst=dst@entry=0x7fffffffc200, src=src@entry=0x7ea3bc2c23fe, value=value@entry=0, len=len@entry=256)
    at /home/tad/Development/CPP/rpm-hardened_malloc/hardened_malloc-2025012700-build/hardened_malloc/h_malloc.c:1912
No locals.
#6  0x0000555555557a83 in dm_strncpy (dest=0x7fffffffc200 "String", src=<optimized out>, n=256)
    at ../device_mapper/libdm-string.c:438
No locals.
#7  _dm_strncpy (dest=0x7fffffffc200 "String", src=<optimized out>, n=256) at ../lib/misc/util.h:42
No locals.
#8  _val_str_to_num (str=str@entry=0x7ea3bc2c23fe "String")
    at /usr/src/debug/lvm2-2.03.30-3.fc42.x86_64/tools/command.c:210
        name = "String\000de", '\000' <repeats 240 times>...
        new = <optimized out>
        i = <optimized out>

edit: it seems they're wholly reliant on c being found before n is reached

we could augment musl_memccpy to take the src size and abort there if reached

SkewedZeppelin avatar Mar 30 '25 14:03 SkewedZeppelin

I'm seeing a memccpy read overflow on both lvm immediately on start and gdm when trying to login

lvm2 uses it as strncpy implementation, https://sources.debian.org/src/lvm2/2.03.31-1/libdm/libdm-string.c/?hl=438#L438:

int dm_strncpy(char *dest, const char *src, size_t n)
{
	if (memccpy(dest, src, 0, n))
		return 1;

	if (n > 0)
		dest[n - 1] = '\0';

	return 0;
}

A targeted fix could be to not check the source length if the stop character is 0. This will still avoid out-of-buffer-writes (just no out-of-buffer-reads).

The function bcopy(3) and swab(3) could also be valid targets:

patch
diff --git a/Makefile b/Makefile
index d18d0b3..0863bbc 100644
--- a/Makefile
+++ b/Makefile
@@ -41,7 +41,7 @@ LDFLAGS := $(LDFLAGS) -Wl,-O1,--as-needed,-z,defs,-z,relro,-z,now,-z,nodlopen,-z
 
 SOURCES := chacha.c h_malloc.c memory.c pages.c random.c util.c
 ifeq ($(CONFIG_BLOCK_OPS_CHECK_SIZE),true)
-    SOURCES += memcpy.c memccpy.c memmove.c memset.c wmemset.c
+    SOURCES += memcpy.c memccpy.c memmove.c memset.c swab.c wmemset.c
     BOSC_EXTRAS := musl.h
 endif
 OBJECTS := $(SOURCES:.c=.o)
@@ -148,6 +148,8 @@ $(OUT)/memmove.o: memmove.c musl.h $(CONFIG_FILE) | $(OUT)
        $(COMPILE.c) -Wno-cast-align $(OUTPUT_OPTION) $<
 $(OUT)/memset.o: memset.c musl.h $(CONFIG_FILE) | $(OUT)
        $(COMPILE.c) -Wno-cast-align $(OUTPUT_OPTION) $<
+$(OUT)/swab.o: swab.c musl.h $(CONFIG_FILE) | $(OUT)
+       $(COMPILE.c) -Wno-cast-align $(OUTPUT_OPTION) $<
 $(OUT)/wmemset.o: wmemset.c musl.h $(CONFIG_FILE) | $(OUT)
        $(COMPILE.c) $(OUTPUT_OPTION) $<
 
diff --git a/h_malloc.c b/h_malloc.c
index c99c0ca..fbc2c13 100644
--- a/h_malloc.c
+++ b/h_malloc.c
@@ -1940,6 +1940,27 @@ EXPORT void *memset(void *dst, int value, size_t len) {
     return musl_memset(dst, value, len);
 }
 
+EXPORT void bcopy(const void *src, void *dst, size_t len) {
+    memmove(dst, src, len);
+}
+
+EXPORT void swab(const void *restrict src, void *restrict dst, ssize_t len) {
+    if (unlikely(len <= 0)) {
+        return;
+    }
+    size_t length = len;
+    if (unlikely(dst < (src + length) && (dst + length) > src)) {
+        fatal_error("swab overlap");
+    }
+    if (unlikely(length > malloc_object_size(src))) {
+        fatal_error("swab read overflow");
+    }
+    if (unlikely(length > malloc_object_size(dst))) {
+        fatal_error("swab buffer overflow");
+    }
+    return musl_swab(src, dst, len);
+}
+
 EXPORT wchar_t *wmemcpy(wchar_t *restrict dst, const wchar_t *restrict src, size_t len) {
     if (unlikely(dst == src || len == 0)) {
         return dst;
diff --git a/include/h_malloc.h b/include/h_malloc.h
index 7974ff5..49ab36b 100644
--- a/include/h_malloc.h
+++ b/include/h_malloc.h
@@ -60,6 +60,8 @@ void *memcpy(void *dst, const void *src, size_t len);
 void *memccpy(void *dst, const void *src, int value, size_t len);
 void *memmove(void *dst, const void *src, size_t len);
 void *memset(void *dst, int value, size_t len);
+void bcopy(const void *src, void *dst, size_t len);
+void swab(const void *src, void *dst, ssize_t len);
 wchar_t *wmemcpy(wchar_t *dst, const wchar_t *src, size_t len);
 wchar_t *wmemmove(wchar_t *dst, const wchar_t *src, size_t len);
 wchar_t *wmemset(wchar_t *dst, wchar_t value, size_t len);
diff --git a/musl.h b/musl.h
index 4349622..de75eb7 100644
--- a/musl.h
+++ b/musl.h
@@ -1,9 +1,11 @@
 #pragma once
 
 #include <stddef.h>
+#include <sys/types.h>
 
 void *musl_memcpy(void *dst, const void *src, size_t len);
 void *musl_memccpy(void *restrict dest, const void *restrict src, int c, size_t n);
 void *musl_memmove(void *dst, const void *src, size_t len);
 void *musl_memset(void *dst, int value, size_t len);
+void musl_swab(const void *_src, void *_dest, ssize_t n);
 wchar_t *musl_wmemset(wchar_t *dst, wchar_t value, size_t len);
diff --git a/swab.c b/swab.c
new file mode 100644
index 0000000..29cdb2f
--- /dev/null
+++ b/swab.c
@@ -0,0 +1,17 @@
+#include "musl.h"
+
+/* Copied from musl libc version 1.2.5 licensed under the MIT license */
+
+#include <unistd.h>
+
+void musl_swab(const void *restrict _src, void *restrict _dest, ssize_t n)
+{
+       const char *src = _src;
+       char *dest = _dest;
+       for (; n>1; n-=2) {
+               dest[0] = src[1];
+               dest[1] = src[0];
+               dest += 2;
+               src += 2;
+       }
+}

Also mempcpy(3) and wmempcpy(3) are trivial to cover:

patch
diff --git a/h_malloc.c b/h_malloc.c
index dbdf01f..ff174a0 100644
--- a/h_malloc.c
+++ b/h_malloc.c
@@ -1946,6 +1946,10 @@ EXPORT void *memmove(void *dst, const void *src, size_t len) {
     return musl_memmove(dst, src, len);
 }
 
+EXPORT void *mempcpy(void *restrict dst, const void *restrict src, size_t len) {
+    return memcpy(dst, src, len) + len;
+}
+
 EXPORT void *memset(void *dst, int value, size_t len) {
     if (unlikely(len == 0)) {
         return dst;
@@ -2008,6 +2012,10 @@ EXPORT wchar_t *wmemmove(wchar_t *dst, const wchar_t *src, size_t len) {
     return (wchar_t *)musl_memmove((char *)dst, (const char *)src, lenAdj);
 }
 
+EXPORT wchar_t *wmempcpy(wchar_t *restrict dst, const wchar_t *restrict src, size_t len) {
+    return wmemcpy(dst, src, len) + len;
+}
+
 EXPORT wchar_t *wmemset(wchar_t *dst, wchar_t value, size_t len) {
     if (unlikely(len == 0)) {
         return dst;
diff --git a/include/h_malloc.h b/include/h_malloc.h
index b31f62e..798ebd2 100644
--- a/include/h_malloc.h
+++ b/include/h_malloc.h
@@ -61,11 +61,13 @@ void h_free(void *ptr);
 void *memcpy(void *dst, const void *src, size_t len);
 void *memccpy(void *dst, const void *src, int value, size_t len);
 void *memmove(void *dst, const void *src, size_t len);
+void *mempcpy(void *dst, const void *src, size_t len);
 void *memset(void *dst, int value, size_t len);
 void bcopy(const void *src, void *dst, size_t len);
 void swab(const void *src, void *dst, ssize_t len);
 wchar_t *wmemcpy(wchar_t *dst, const wchar_t *src, size_t len);
 wchar_t *wmemmove(wchar_t *dst, const wchar_t *src, size_t len);
+wchar_t *wmempcpy(wchar_t *dst, const wchar_t *src, size_t len);
 wchar_t *wmemset(wchar_t *dst, wchar_t value, size_t len);
 #define h_memcpy_internal musl_memcpy
 #define h_memmove_internal musl_memmove

cgzones avatar Apr 05 '25 14:04 cgzones

I've been using this for a while without issue (although I do suspect it causes gedit to break when parsing markdown sometimes), but I've been meaning to do some sort of benchmarks for it, so see below.

I don't understand why the default variant is regardless a 4x slowdown compared to light or glibc?

1000x mempcy loop benchmark: edit 2025/08/25: I removed the results. I just re-ran them and it seems this patchset itself is the cause of the 2-4x slowdown.

nohm
 4 MB = 0.069237 ms
 8 MB = 0.141719 ms
16 MB = 0.285702 ms
32 MB = 0.536913 ms
64 MB = 1.089654 ms
128 MB = 2.218141 ms
256 MB = 4.393660 ms
512 MB = 12.621499 ms
1024 MB = 25.016972 ms

2024101200 default
 4 MB = 0.069388 ms
 8 MB = 0.142448 ms
16 MB = 0.287897 ms
32 MB = 0.539503 ms
64 MB = 1.076942 ms
128 MB = 2.168209 ms
256 MB = 4.267054 ms
512 MB = 10.751309 ms
1024 MB = 21.433207 ms

2025040400 default+this patchset
 4 MB = 0.070453 ms
 8 MB = 0.141374 ms
16 MB = 0.294853 ms
32 MB = 0.917909 ms
64 MB = 2.636670 ms
128 MB = 6.125302 ms
256 MB = 12.844230 ms
512 MB = 26.764763 ms
1024 MB = 50.618942 ms

SkewedZeppelin avatar Aug 10 '25 05:08 SkewedZeppelin

Non-light mainly means using write-after-free checks and the slab quarantine, both of which are expensive. Slab quarantine uses a fair bit of memory too. We'd like to make the slab quarantine cheaper but it's not really clear how. The main way to make it cheaper would just be lowering how much it quarantines by default and having opt-in to doing more.

thestinger avatar Aug 10 '25 14:08 thestinger