btrfs-progs icon indicating copy to clipboard operation
btrfs-progs copied to clipboard

send/receive fails with "invalid tlv in cmd tlv type = 816"

Open dehnhard opened this issue 1 year ago • 12 comments

Hi,

I setup a btrbk remote backup that transfers btrfs subvolumes. Source is a Debian 12 amd64 machine targeting a Debian 12 armel device (An older Zyxel NSA325v2 on which the proprietary OS is replaced with a Debian installation). btrfs-progs version is 6.2-1 on both sides (current in the stable bookworm repo).

The btrbk log shows the called send and receive commands and the resulting error:

ERROR: Failed to send/receive subvolume: /home/daniel/Experimente/backup/btrbk_snapshots/testbackup.20240330T1102  -> dbackup.lan:/home/dnotebook/testbackup/testbackup.20240330T1102
ERROR: ... Command execution failed (exitcode=1)
ERROR: ... sh: btrfs send '/home/daniel/Experimente/backup/btrbk_snapshots/testbackup.20240330T1102' | ssh [email protected] 'sudo -n btrfs receive '\''/home/dnotebook/testbackup/'\'''
ERROR: ... invalid tlv in cmd tlv_type = 816

Any ideas?

dehnhard avatar Mar 31 '24 09:03 dehnhard

Just of curiosity, did you recreate the filesystem after installing Debian?

Forza-tng avatar Apr 02 '24 19:04 Forza-tng

Just of curiosity, did you recreate the filesystem after installing Debian?

No, I created the filesystem (a btrfs RAID1 with two harddisks) in the current Debian installation. Nothing further.

Another notice: The armel device also has a newer kernel with possible different config, but the btrfs module is loaded and everything else works so far.

dehnhard avatar Apr 03 '24 07:04 dehnhard

@dehnhard I believe the Marwell SoC is an ARM64, and Arm in general isn't as widely tested as AMD64.

It might be good to test newer btrfs-progs and kernel if possible. Latest btrfs-progs is v6.8. Since Debian doesn't have it, you can use the static builds as they do not need installation and can be used on any Linux system.

amd64: https://github.com/kdave/btrfs-progs/releases arm64: https://mirrors.tnonline.net/btrfs/btrfs-progs/arm64/

The arm64 build is provided by me as Github unfortunately doesn't have them yet. Build instructions if you want to compile yourself https://wiki.tnonline.net/w/Btrfs/Statically_built_btrfs-progs

But first, let's check if the error is on the sending or the receiving side.

Is there an error on the sending side? You can easily check if sending works properly using

btrfs send '/home/daniel/Experimente/backup/btrbk_snapshots/testbackup.20240330T1102' > /dev/null

If all is ok it should look like

# btrfs -vv send home.20240406T1301/ > /dev/null
Protocol version requested: 1 (supported 3)
At subvol home.20240406T1301/
BTRFS_IOC_SEND returned 0

Now check if there is an error on the receiving side. Can you check its dmesg? If dmesg has no errors you can test receiving from a local file.

  1. btrfs send '/home/daniel/Experimente/backup/btrbk_snapshots/testbackup.20240330T1102' > /path/to/file
  2. copy 'file' to the remote
  3. on the remote btrfs receive /home/dnotebook/testbackup/ < 'file'

Forza-tng avatar Apr 06 '24 14:04 Forza-tng

@Forza-tng thanks for making it easy with those good instructions!

I believe the Marwell SoC is an ARM64, and Arm in general isn't as widely tested as AMD64.

No, it's called Kirkwood (see Info from OpenWRT here), I believe the official Debian platform is called armel. The Kernel is coming from here, a great guy keeps providing new kernels for these old machines, which are capable enough for tasks like NAS for backups. uname -a says Linux dbackup 6.7.5-kirkwood-tld-1 #1 PREEMPT Sat Feb 17 16:27:56 PST 2024 armv5tel GNU/Linux

It might be good to test newer btrfs-progs and kernel if possible.

I can certainly test new builds and would appreciate static built binaries for that. I'd prefer to avoid compiling and can't do a reinstall on that machine.

But first, let's check if the error is on the sending or the receiving side.

  1. Sending looks good.
# sudo btrfs -vv send '/home/daniel/Experimente/backup/btrbk_snapshots/testbackup.20240330T1102' > /dev/null
Protocol version requested: 1 (supported 2)
At subvol /home/daniel/Experimente/backup/btrbk_snapshots/testbackup.20240330T1102
BTRFS_IOC_SEND returned 0
  1. dmesg shows nothing
  2. Retrieve fails with that error
# btrfs -vv receive /home/dnotebook/testbackup/ < /home/dnotebook/SENT
At subvol testbackup.20240330T1102
receiving subvol testbackup.20240330T1102 uuid=3ea2d27c-bacb-2141-9a98-6d3620a2642e, stransid=165679
chown  - uid=1000, gid=1000
chmod  - mode=0755
utimes 
ERROR: invalid tlv in cmd tlv_type = 816

The receive command creates an empty subvolume with a wrong user, but correct uid (different on both ids). I guess thats expected? It's only relevant to the test setup. The real backup is uid 0 on both machines.

dehnhard avatar Apr 07 '24 11:04 dehnhard

@Forza-tng thanks for making it easy with those good instructions!

I believe the Marwell SoC is an ARM64, and Arm in general isn't as widely tested as AMD64.

No, it's called Kirkwood (see Info from OpenWRT here), I believe the official Debian platform is called armel. The Kernel is coming from here, a great guy keeps providing new kernels for these old machines, which are capable enough for tasks like NAS for backups. uname -a says Linux dbackup 6.7.5-kirkwood-tld-1 #1 PREEMPT Sat Feb 17 16:27:56 PST 2024 armv5tel GNU/Linux

Thanks for explaining. According to https://www.debian.org/ports/arm/index.en.html armel is an 32bit arm architecture.

It might be good to test newer btrfs-progs and kernel if possible.

I can certainly test new builds and would appreciate static built binaries for that. I'd prefer to avoid compiling and can't do a reinstall on that machine.

I don't have a 32bit buildhost for arm, so at the moment i cant make a static build.

But first, let's check if the error is on the sending or the receiving side.

  1. Sending looks good.
# sudo btrfs -vv send '/home/daniel/Experimente/backup/btrbk_snapshots/testbackup.20240330T1102' > /dev/null
Protocol version requested: 1 (supported 2)
At subvol /home/daniel/Experimente/backup/btrbk_snapshots/testbackup.20240330T1102
BTRFS_IOC_SEND returned 0

This is good.

  1. dmesg shows nothing
  2. Retrieve fails with that error
# btrfs -vv receive /home/dnotebook/testbackup/ < /home/dnotebook/SENT
At subvol testbackup.20240330T1102
receiving subvol testbackup.20240330T1102 uuid=3ea2d27c-bacb-2141-9a98-6d3620a2642e, stransid=165679
chown  - uid=1000, gid=1000
chmod  - mode=0755
utimes 
ERROR: invalid tlv in cmd tlv_type = 816

The receive command creates an empty subvolume with a wrong user, but correct uid (different on both ids). I guess thats expected? It's only relevant to the test setup. The real backup is uid 0 on both machines.

Did you try the receive as root user?

You could try using send protocol 2:

btrfs send --proto 2 > file
btrfs receive --proto 2 < file

If this is not working perhaps you could ask in #btrfs irc channel or take it to the maling list to get some input from the developers.

Forza-tng avatar Apr 07 '24 11:04 Forza-tng

Did you try the receive as root user?

Yes, it fails with the same message. Also it works with local source and target as root and with an without --proto 2. So it looks like something in the data from the source is unknown to the receiving side.

What about looking at the semantics? As far as I have been able to look, there is a variable called tlv_type in the btrfs-progs code. What does tlv_type 816 mean and why might it be unknown for the receiver?

For compiling a static binary I'll have to see when I can put some time aside to try. I hope the instructions in the README/INSTALL docs are somewhat straightforward.

dehnhard avatar Apr 07 '24 14:04 dehnhard

ERROR: ... invalid tlv in cmd tlv_type = 816

The command types and attributes are defined at https://github.com/kdave/btrfs-progs/blob/master/kernel-shared/send.h#L128, the maximum for attributes is value 35. The reported error is way off, so it's either a corruption in the stream, or the Zyxel NAS is using a custom stream extension.

It's known that Synology has such extension and the stream is not compatible with upstream, most likely Zyxel did something like that too.

It would be possible to work around that, but ideal solution is to poke Zyxel to send the extensions upstream.

kdave avatar Apr 12 '24 17:04 kdave

I could not find any downloadable sources on Zyxel sites (forums, community, general search). If it exists please post the link, linux kernel and btrfs-progs. We can add it as the protocol extensions are possible.

kdave avatar Apr 12 '24 17:04 kdave

It's known that some nas vendors have a customised btrfs that isn't fully compatible. It's why I asked if the filesystem was created after debian was installed. It unfortunately seems it wasn't. @dehnhard can you confirm?

Forza-tng avatar May 02 '24 18:05 Forza-tng

@Forza-tng Maybe I misunderstood what you meant in your first question with "recreate"? The btrfs filesystem was created after the Debian installation using the pure Debian binaries on new and empty harddisks. In fact I did nothing with the original NAS firmware than directly replacing it with Debian.

@kdave Thanks for the explanation. Something like a custom stream extension sounds like a possible culprit. Is my understanding correct, that it's like a special hardware mode similar to a GPU or Crypto extension which would need some specific commands / driver code to use it correctly?

I'd be happy to poke around for driver code or protocol specifications, but please give me some details about what to ask for exactly. Would such extensions exist in userland binaries, the Linux kernel, the btrfs kernel module or even in bootloader code (which is U-Boot, also replaced with a newer version while installing Debian)?

I think it's also possible, that Zyxel never implemented any extension but bought them with the SoC from Marvell. In that case it would probably be better to poke the latter?

dehnhard avatar May 02 '24 19:05 dehnhard

@dehnhard if you use plain Debian tools, then it couldn't be anything to do with the SoC, unless the tools and kernel from Debian are custom Zyxel made?

@kdave is it tested to work to do send|receive from a 64bit system to a 32bit one?

Forza-tng avatar May 03 '24 14:05 Forza-tng

I am experiencing the same issue in my custom Buildroot installation on an older Linksys DNS-320 with a Marvell ARMv5 Xscale processor, which is the same model used in the Zyxel NSA325v2.

The backup process works correctly when using the Buildroot-supported Btrfs-progs version 5.16. But when attempting to use any newer Btrfs-progs versions, such as 6.6.2, the reported issue appears.

rdrnau avatar May 15 '24 17:05 rdrnau

I repeat, there is nothing Zyxel made involved in my installation. I imagine that any possible modification which could affect an SSH stream must be very low level. But @rdrnau s observation, that an older version did work for him points to a possible regression in btrfs-progs, doesn't it?

dehnhard avatar May 20 '24 17:05 dehnhard

I think we can rule out custom modifications by Zyxel, etc.

Shame on me 🙄, but I dumped the send.h together with the he error code from the original post into ChatGPT...

The suggestion is that there is an alignment issue in how the stream is serialized on 64bit send side compared to how it is handled on the 32bit side:

Example: Sending side: tlv_type =16

| tlv_type (2 bytes) | tlv_len (2 bytes) |
|        0x0010      |       0x0002      |

Could become on the receiving side:

| Misaligned Read (4 bytes)                |
| 0x10 0x00 0x02 0x00 (0x0002 0x0000) => 816 |

There also seems to be a mismatch on the tlv_len datatype.

send.h

struct btrfs_tlv_header {
	__le16 tlv_type;
	/* len excluding the header */
	__le16 tlv_len;
} __attribute__ ((__packed__));

send-stream.c

struct btrfs_send_attribute {
	u16 tlv_type;
	/*
	 * Note: in btrfs_tlv_header, this is __le16, but we need 32 bits for
	 * attributes with file data as of version 2 of the send stream format.
	 */
	u32 tlv_len;
	char *data;
};

Forza-tng avatar May 20 '24 20:05 Forza-tng

@kdave is it tested to work to do send|receive from a 64bit system to a 32bit one?

This should work but such combinations are rare and not tested, typically it would be the same 32/32 or 64/64. The observation with the type mismatches looks plausible. Last change was to the v2 protocol update, something could have gone wrong there.

kdave avatar May 20 '24 22:05 kdave

Most likely cf269aa47bce1fb9902738eace0afa7bca8f6d1b + aa1ca3789e9ba8acbfe6bd31aacef5d1f38fa942 added in v5.19, though I still don't see how exactly it could happen.

kdave avatar May 20 '24 22:05 kdave

aa1ca37

The direct cast to __le16 instead of first casting to btrfs_tlv_header looks suspicious to me. Might not work in this case because of 32 vs 64bit or it might be an endian issue. AFAIK 32bit ARM CPUs normally are little-endian but can be switched to big-endian with the twist that they use a different big-endian mode than modern ARM64 (BE-32 instead of BE-8). Don't know how C compilers address this though.

dehnhard avatar May 21 '24 10:05 dehnhard

That might be it. From the commit:

tlv_type = le16_to_cpu(*(__le16 *)data);

this is (now) obviously wrong. The leXX_to_cpu marcros take the value and swap bytes if needed, but the value here is read by compiler in the host order taht may not be LE. The right thing is get_unaligned_le16, that takes the raw bytes and does the conversion.

kdave avatar May 21 '24 13:05 kdave

Can you please test this (also availabe in branch ci/fix-receive-be):

--- a/common/send-stream.c
+++ b/common/send-stream.c
@@ -183,7 +183,7 @@ static int read_cmd(struct btrfs_send_stream *sctx)
                        ret = -EINVAL;
                        goto out;
                }
-               tlv_type = le16_to_cpu(*(__le16 *)data);
+               tlv_type = get_unaligned_le16(data);
 
                if (tlv_type == 0 || tlv_type > __BTRFS_SEND_A_MAX) {
                        error("invalid tlv in cmd tlv_type = %hu", tlv_type);
@@ -204,7 +204,7 @@ static int read_cmd(struct btrfs_send_stream *sctx)
                                ret = -EINVAL;
                                goto out;
                        }
-                       send_attr->tlv_len = le16_to_cpu(*(__le16 *)data);
+                       send_attr->tlv_len = get_unaligned_le16(data);
                        pos += sizeof(__le16);
                        data += sizeof(__le16);
                }

kdave avatar May 21 '24 14:05 kdave

I still am not sure if I understand what's going on. If this was a simple endianity bug then the value 816 would give some hint. It's 0x330, so byteswap 0x3003 = 12291, and that's not a valid type either. The error appears after utime stream command, which is type 20 = 0x14. The v2 extension to increase length to u32 is only for data. Utime, chmod, chown are all in the v1 of protocol so either the bug has always been there and LE->BE never worked, or there's another 64/32 bit issue. Could be also partially caused on the kernel side, we've had issues with the 64/32 bit compatibility.

kdave avatar May 21 '24 14:05 kdave

Can you please test this (also availabe in branch ci/fix-receive-be):

If no one else does it first, I can do that. No promises though when I'll find the time between private obligations and other hobbies.

OTOH if anybody is interested in paying a freelancer I can offer to analyse the 64/32 bit and data type compatibility situation that you are facing here. I have done similar work before.

dehnhard avatar May 22 '24 08:05 dehnhard

It was easy to reproduce the bug caused by reading a 16-bit word from an uneven address, resulting in byte flipping on the ARMv5 and causing an error. I have compiled the 'ci/fix-receive-be' branch and have made preliminary tests. The error message 'ERROR: invalid tlv in cmd tlv_type = 816' no longer appears when using the 'ci/fix-receive-be' branch with both protocols.

I compiled a static binary using buildroot cross-compiling and am currently creating a test backup for later comparison. Do you recommend running additional tests ?

rdrnau avatar May 24 '24 00:05 rdrnau

A 360gb backup was perfect using patched branch. But how about this: In libbtrfs/send.h the following struct is defined

struct btrfs_cmd_header { /* len excluding the header / __le32 len; __le16 cmd; / crc including the header with zero crc field */ __le32 crc; } attribute ((packed));

In send-stream.c line 143, there is the line 'crc = le32_to_cpu(sctx->cmd_hdr->crc);'

The crc field starts at offset 6, which is not a multiple of 4. This means it is not aligned to a 4-byte boundary. Processors like ARMv5 require 32-bit integers to be 4-byte aligned. Should the line be 'crc = get_unaligned_le32(sctx->cmd_hdr->crc);'

rdrnau avatar May 24 '24 21:05 rdrnau

There is more. line 144 'sctx->cmd_hdr->crc = 0;' should be 'put_unaligned_le32(0, sctx->cmd_hdr->crc)'

in common/send-stream.c line 163 'crc = le32_to_cpu(cmd_hdr->crc);', line 165 'cmd_hdr->crc = 0;'

rdrnau avatar May 24 '24 21:05 rdrnau

Thank you very much for testing. Agreed about the alignment and that it's basically everywhere, alle the le32_to_cpu that's indirectly from a pointer cast to btrfs_cmd_header from raw buffer.

kdave avatar May 24 '24 21:05 kdave

AFAIK the strict alignment architectures can work in several modes, where an unaligned read is a hard error, or an exception (fixup mode?) that can be handled by kernel that emulates the read/write instructions.

kdave avatar May 24 '24 21:05 kdave

AFAIK the strict alignment architectures can work in several modes, where an unaligned read is a hard error, or an exception (fixup mode?)

An unaligned access mode was introduced in ARMv6, but it is commonly not used. In ARMv7, the mode was improved. This issue may concern Rapsberry Pi 1 & Zero users, that are based on ARMv6.

rdrnau avatar May 24 '24 22:05 rdrnau

I've pushed update to the branch ci/fix-receive-be, also with fix to libbtrfs (still incomplete, version number needs to be bumped too).

kdave avatar May 24 '24 22:05 kdave

Integers in btrfs_dir_item are byte aligned since 'struct btrfs_disk_key' it contains is 17 bytes long. libbtrfsutil/subvolume.c: line 610: name_len = le16_to_cpu(dir->name_len); - should it be ... get_unaligned_le16(dir->name_len);

cmds/rescue-chunk-recover.c: ??? u8 expected_csum[BTRFS_CSUM_SIZE]; ... put_unaligned_le32(tree_csum, expected_csum); - I guess it works with crc32c

image/image-restore.c: line 1025: item = &cluster->items[i]; bufsize = le32_to_cpu(item->size); - should it be ... get_unaligned_le32(item->size); item_bytenr = le64_to_cpu(item->bytenr); - should it be ... get_unaligned_le64(item->bytenr);

line 1131, 1133, 1139, 1145, 1151, 1168, 1314, 1315

Both 'item->size' and 'item->bytenr' are byte aligned, evident by these struct's: struct meta_cluster_header { __le64 magic; __le64 bytenr; __le32 nritems; u8 compress; } attribute ((packed));

/* cluster header + index items + buffers */ struct meta_cluster { struct meta_cluster_header header; struct meta_cluster_item items[]; } attribute ((packed));

line 1201: mdres->devid = le64_to_cpu(super->dev_item.devid); - dev_item is at an uneven byte address in struct btrfs_super_block line 1286

Give it a big eyeball :)

rdrnau avatar May 25 '24 14:05 rdrnau

The performance of Btrfs can be significantly improved on hardware that requires software fallback to handle word alignment corrections. Specifically, when get_unaligned_le64 is called since it forces software handling, which on ARMv5 compiles into a sequence of 16 instructions instead of a single load instruction. By changing the stack accessing to unaligned in a patch you made there is a speed regression on the armel ARMv5 and similar platforms by making every read significantly slower. I patched my kernel years ago for speeding it up. And it works.

I propose the following patch for Btrfs-progs. Additionally, I recommend implementing a similar patch for the Btrfs filesystem itself. This optimization will significantly enhance the performance for users of ARMv5 based NAS servers, possibly Raspberry Pi 1 and other similar hardware platforms.

--- a/libbtrfs/ctree.h    2024-05-24 01:12:51.506765058 +0100
+++ b/libbtrfs/ctree.h   2024-05-26 16:35:52.476518069 +0100
@@ -1615,7 +1615,16 @@
 static inline u##bits btrfs_##name(const struct extent_buffer *eb)     \
 {                                                                      \
        const struct btrfs_header *h = (struct btrfs_header *)eb->data; \
-       return le##bits##_to_cpu(h->member);                            \
+       switch (offsetof(type, member) & 7)                             \
+       {                                                               \
+       case 0:                                                         \
+               return le##bits##_to_cpu(h->member);                    \
+       case 4:                                                         \
+               if (bits <= 32) return le##bits##_to_cpu(h->member);    \
+               return get_unaligned_le##bits(&h->member);              \
+       default:                                                        \
+               return get_unaligned_le##bits(&h->member);              \
+       }                                                               \
 }                                                                      \
 static inline void btrfs_set_##name(struct extent_buffer *eb,          \
                                    u##bits val)                        \
@@ -1643,11 +1652,27 @@
 #define BTRFS_SETGET_STACK_FUNCS(name, type, member, bits)             \
 static inline u##bits btrfs_##name(const type *s)                      \
 {                                                                      \
-       return le##bits##_to_cpu(s->member);                            \
+       switch (offsetof(type, member) & 7)                             \
+       {                                                               \
+               case 0:                                                 \
+                       return le##bits##_to_cpu(s->member);            \
+               case 4:                                                 \
+                       if (bits <= 32)                                 \
+                               return le##bits##_to_cpu(s->member);    \
+                       return get_unaligned_le##bits(&s->member);      \
+               default:                                                \
+                   return get_unaligned_le##bits(&s->member);          \
+       }                                                               \
 }                                                                      \
 static inline void btrfs_set_##name(type *s, u##bits val)              \
 {                                                                      \
-       s->member = cpu_to_le##bits(val);                               \
+       switch (offsetof(type, member) & 7)                             \
+       {                                                               \
+               case 0:                                                 \
+                       s->member = cpu_to_le##bits(val);               \
+                       break;                                          \
+               default:                                                \
+                       put_unaligned_le##bits(val, &s->member);        \
+       }                                                               \
 }
 
 BTRFS_SETGET_FUNCS(device_type, struct btrfs_dev_item, type, 64);

rdrnau avatar May 26 '24 15:05 rdrnau