borg icon indicating copy to clipboard operation
borg copied to clipboard

"attrs: [Errno 12] Cannot allocate memory" when restoring ACLs without associated user id

Open jernejs opened this issue 8 months ago • 19 comments

I had to do a bare metal restore of my server today, and during restore I noticed a lot of "attrs: [Errno 12] Cannot allocate memory" errors. It appears that Borg (v1.4.0) stores ACLs as username only (no numeric id), and then fails to restore them with that error message when the user is not present on the system.

Here's a minimal test case — restore archive test3 to reproduce (the test was done by setting ACLs to user belonging to an Active Directory domain; after stopping Samba, restoring the backup produces the above error; if I make a backup while Samba is stopped, Borg stores numeric id in ACL, and restore then works regardless of whether Samba is running or not).

jernejs avatar Apr 13 '25 19:04 jernejs

OS is Linux I suspect?

errno.ENOMEM == 12

Please add the borg command you used for restoring.

You say you noticed a lot of these errors. I am wondering about the "attrs: " part - is that a file name? Is that always the same in all error msgs you see?

ThomasWaldmann avatar Apr 14 '25 12:04 ThomasWaldmann

I could reproduce using the repo linked above:

(borg-env) vagrant@vagrant:~$ tar xJf borgacl.tar.xz 
(borg-env) vagrant@vagrant:~$ borg list borgacl
test                                 Sun, 2025-04-13 17:39:13 [1a1a68e55718949e2ea0d52aee140e5935202c6fa709e4658fe2d8b17debc63a]
test2                                Sun, 2025-04-13 17:40:32 [18ad623c8d22144a23cb17b6e8a2071f065b3dcbaf7239416d477b7eb432c121]
test3                                Sun, 2025-04-13 17:44:25 [28ae33a43c08ffe6c4d453962310985747249c4b22f633d894cf1f838be0e665]
(borg-env) vagrant@vagrant:~$ borg extract borgacl::test3
acltest/bla.txt: attrs: [Errno 12] Cannot allocate memory: '/home/vagrant/acltest/bla.txt'
acltest: attrs: [Errno 12] Cannot allocate memory: '/home/vagrant/acltest'

ThomasWaldmann avatar Apr 14 '25 14:04 ThomasWaldmann

That's the archived access ACL that blows things up:

b'''
user::rw-
user:YGGDRASIL\\\\ender:r--:YGGDRASIL\\\\ender
user:1234567:r--:1234567
user:123456789:r--:123456789
group::r--
mask::r--
other::r--
'''

This gets converted via acl_use_local_uid_gid to:

b'''
user::rw-
user:YGGDRASIL\\\\ender:r--
user:1234567:r--
user:123456789:r--
group::r--
mask::r--
other::r--
'''

And if that is fed into acl_from_text(), it blows up with:

acltest/bla.txt: attrs: [Errno 12] Cannot allocate memory: '/home/vagrant/acltest/bla.txt'

ThomasWaldmann avatar Apr 14 '25 15:04 ThomasWaldmann

Right, OS is Linux, and borg extract borgacl::test3 is enough to reproduce (and using --numeric-ids doesn't help).

jernejs avatar Apr 14 '25 16:04 jernejs

There are multiple issues here with acl_use_local_uid_gid() trying to convert user:YGGDRASIL\\\\ender:r--:YGGDRASIL\\\\ender:

As field0 is "user", it tries to convert the name from field1 to a local uid.

But, if samba is not running, that fails (the mapping is unknown then) and it defaults to the contents of field3 (which usually should contain the archived uid, but is a name here!?).

Using --numeric-ids at extract time does not help, because it tries to use the archived uid from field3, but that is not a uid here, but same name as in field1.

So guess the root cause of this issue is field3 not containing an archived uid.

Additionally, there is also non-borg issue with acl_from_text wanting to allocate lots of memory for inputs it can't process.

ThomasWaldmann avatar Apr 14 '25 16:04 ThomasWaldmann

borg create --numeric-ids archives ACLs using the acl_numeric_ids converter, which puts the uid into field1 and field3 IF it could map the username from field1 to a uid. Otherwise it will put the username into field1 and field3.

borg create archives ACLs using the acl_append_numeric_ids converter, which puts the uid into field3 IF it could map the username from field1 to a uid. Otherwise it will put the username into field3.

So, it looks like maybe samba was not running at borg create time, thus the uid lookups failed and the user name (not id) ended up in field3?

ThomasWaldmann avatar Apr 14 '25 16:04 ThomasWaldmann

Reproduce the non-borg bug:

#include <stdio.h>
#include <stdlib.h>
#include <sys/acl.h>

int main() {
    const char *acl_text = "u:HOST\\\\user:rwx,g::r--,o::---";

    acl_t acl = acl_from_text(acl_text);
    if (acl == NULL) {
        perror("acl_from_text");
        return EXIT_FAILURE;
    }

    printf("ACL successfully parsed from text: %s\n", acl_text);

    // Always free the allocated ACL structure
    acl_free(acl);

    return EXIT_SUCCESS;
}

Result:

$ ./acl_test 
acl_from_text: Cannot allocate memory

Any non-existent username leads to this error, even simple ones like "foo".

Update: seems fixed since a while:

https://git.savannah.nongnu.org/cgit/acl.git/commit/?id=56abe432b65801f31277fb9a3bca0f9e31502315

Not released yet, bug is still present in libacl 2.3.2.

ThomasWaldmann avatar Apr 14 '25 18:04 ThomasWaldmann

About field3: I think there is only one way how a name can end up being there (and getting into the archived ACL):

If the lookup username -> userid (or groupname -> groupid) fails at borg create time.

The user2uid function of borg uses a caching wrapper, so if the first lookup fails, any further lookup will return the same result.

@jernejs please check. ^^^

ThomasWaldmann avatar Apr 14 '25 21:04 ThomasWaldmann

How would I check for this?

jernejs avatar Apr 14 '25 22:04 jernejs

@jernejs Maybe check if all services required for the lookup run all the time while the backup is running, check cron jobs and other timers, logs.

ThomasWaldmann avatar Apr 15 '25 11:04 ThomasWaldmann

I've been running this since yesterday evening, and it hasn't exited yet:

import pwd
import time
import sys

def user2uid(user, default=None):
    try:
        return user and pwd.getpwnam(user).pw_uid
    except KeyError:
        return default

while str(user2uid('YGGDRASIL\\ender','YGGDRASIL\\ender')) == "1101104":
        print(".", end="", flush=True)
        time.sleep(1)

(I did test it first by stopping Samba, which caused it to exit immediately).

jernejs avatar Apr 15 '25 13:04 jernejs

Maybe make the sleep much shorter (like 1ms), so you could also notice "outages" shorter than a second.

ThomasWaldmann avatar Apr 15 '25 13:04 ThomasWaldmann

OK, commented out the print line and changed interval to .001, created new borg archive, stopped Samba (which stopped the script), and tried extracting the archive, which still gave Cannot allocate memory.

jernejs avatar Apr 15 '25 13:04 jernejs

Did you use borg extract --numeric-ids? If samba is stopped, it can only work if it does not try looking up name->id, but just uses the id it has archived.

Then 2 outcomes are possible:

  • it has an archived uid in field3 (as it should be) and just works, no lookup
  • it has an archived username in field3 (lookup at borg create time didn't give a uid), would need to do a name->id lookup now and fails (with that cannot allocate error due to the bug in libacl)

ThomasWaldmann avatar Apr 15 '25 21:04 ThomasWaldmann

When I was doing real archive extract, I did use --numeric-ids; while doing tests with the above archive, I tested both with and without (and it didn't make any difference). If Samba isn't running while creating the archive, numeric id gets stored (presumably in both fields, since there's no way to look up username without winbind); such archive can be extracted whether Samba is running or not.

jernejs avatar Apr 16 '25 07:04 jernejs

Hmm, maybe we can take another approach (as I miss the necessary samba/winbind environment here to reproduce):

Can you reproduce the bug without involving samba/winbind?

The bug to reproduce is borg create (without --numeric-ids) creating an archive that has archived ACLs with the username in field3 (and not the uid, as it should be).

If it's a borg bug, that should be possible and if it is not, guess the samba devs need to analyse/debug this.

Maybe you can also try if this script is enough to reproduce in your samba environment:

import os
from borg.platform import *

path = "foo"

# use a OS-level fd as borg usually does
item = {}
with open(path) as f:
    fd = f.fileno()
    acl_get(path, item, os.stat(path), fd=fd)

print(item)

# use a path (older borg worked path-based)
item = {}
acl_get(path, item, os.stat(path), fd=None)

print(item)

# example output (valid):
# {'acl_access': b'user::rw-\nuser:root:r--:0\ngroup::r--\nmask::r--\nother::r--'}
#                       field 0    1    2   3
#                                           ^ has a uid as it should be.
#                                  ^          has a username as it should be.

If that makes the bug visible in the samba environment, try to produce the same bug without samba.

ThomasWaldmann avatar Apr 17 '25 16:04 ThomasWaldmann

Cannot allocate memory demonstrates unforgivably poor UX. It's a useless error message that gives no indication on how to fix the issue.

Have run a battery of tests to figure out exactly how to trigger the exception:

  1. forget the mask
  2. use user name or group name instead of respective id
  3. u:r-x <-- forget a colon
  4. g:r-x <-- forget a colon
  5. Missing critical functionality u::rWX

o:- and o::- both work.

msftcangoblowm avatar Sep 17 '25 04:09 msftcangoblowm

@msftcangoblowm read the posts above, especially https://github.com/borgbackup/borg/issues/8750#issuecomment-2802637889 .

TL;DR: that error msg comes from libacl and is due to a (meanwhile fixed) bug in libacl.

I don't know how good the error msg gets without that bug though, but it is not an error msg coming from borg, but from libacl.

ThomasWaldmann avatar Sep 17 '25 07:09 ThomasWaldmann

>>> import posix1e
>>>
>>> text = "u::rwx,g::rwx,o::-,m::rwx,u:dfasfasdfsadfasdfasdfasdfasdfasdfasdfsadf:rwx"
>>> acl = posix1e.ACL(text=text)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 12] Cannot allocate memory

ACL_USER Entry has qualifier name "dfasfasdfsadfasdfasdfasdfasdfasdfasdfsadf". Which is obviously a nonexistent user (on purpose).

This underlying issue is, libacl or pylibacl error message(s) are useless. Not giving enough information on what went wrong and more importantly how to fix it.

Gonna let you in on a little secret, wrote yet unpublished package, capitalx which attempts to make the UX pleasant. Lets see how capitalx deals with this.

>>> from capitalx.acl_posix1e import Posix1e
>>>
>>> text = (
...     "u::rwx,g::rwx,o::-,m::rwx,"
...     "u:dfasfasdfsadfasdfasdfasdfasdfasdfasdfsadf:rwx"
... )
>>> acl = Posix1e(
...     text=text
...     is_access_acl = True,
... )  # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE, +IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
  File ...
KeyError: "getpwnam(): name not found: 'dfasfasdfsadfasdfasdfasdfasdfasdfasdfsadf'"
<BLANKLINE>
The above exception was the direct cause of the following exception:
<BLANKLINE>
Traceback (most recent call last):
  File ...
capitalx.errors.ACLUSERQualifierError: [Errno 43] qualifier does not correspond to an actual user. Fix the ACL_USER qualifier with a known user id|name. These commands are helpful: cat `/etc/passwd` and `id` or `id --user`. Respectively: lists all known users, current session user, current session user id: ''

doctest and pytest are choking on the long Traceback. That's an upstream issue. Failing due to pytest 12417. So ignore that.

Notice the error message is very explicit about exactly what caused the Exception. As well as gives advice on how to fix it. The custom Exception is a subclass of OSError. So can easily catch it.

capitalx.acl_posix1e.Posix1e.set_extended_acl adapts and fixes common mistakes, unlike posix1e.ACL.applyto. But a bad qualifier name cannot be recovered from.

msftcangoblowm avatar Oct 14 '25 08:10 msftcangoblowm