loki
loki copied to clipboard
Loki (in Docker) reports "no space left on device" but there's plenty of space/inodes
Describe the bug
When running the Loki 1.2.0 Docker image, Loki is reporting that it can't write chunks to disk because there is "no space left on device", although there appears to be plenty of space.
level=error ts=2020-01-11T19:13:11.822567024Z caller=flush.go:178 org_id=fake msg="failed to flush user" err="open /tmp/loki/chunks/ZmFrZS84NDBiODY0MTMwOWFkOTZlOjE2Zjk1ZWNjNmU1OjE2Zjk1ZWNkM2JjOmRkMWUwMjUx: no space left on device"
level=error ts=2020-01-11T19:13:11.851323284Z caller=flush.go:178 org_id=fake msg="failed to flush user" err="open /tmp/loki/chunks/ZmFrZS82ZDNlZmFhODk1OWZiYjQxOjE2Zjk1ZTgzOTI4OjE2Zjk1ZmMyNzRiOjg3MTQ1OTkw: no space left on device"
Plenty of space and inodes available on disk where /tmp/loki
volume lives:
$ df -h /dev/sda1
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 915G 223G 646G 26% /
$ df -i /dev/sda1
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sda1 60981248 5473071 55508177 9% /
/tmp/loki
named volume mount from docker inspect
"Mounts": [
{
"Type": "volume",
"Name": "loki",
"Source": "/var/lib/docker/volumes/loki/_data",
"Destination": "/tmp/loki",
"Driver": "local",
"Mode": "rw",
"RW": true,
"Propagation": ""
}
$ docker volume inspect loki
[
{
"CreatedAt": "2020-01-11T10:37:39-08:00",
"Driver": "local",
"Labels": null,
"Mountpoint": "/var/lib/docker/volumes/loki/_data",
"Name": "loki",
"Options": null,
"Scope": "local"
}
]
Execing into the loki container and doing manual write tests to verify that files can be written
$ docker exec -it loki sh
/ # cd /tmp/loki
/tmp/loki # ls -l
total 596644
drwxr-xr-x 2 root root 610926592 Jan 11 19:24 chunks
drwxr-xr-x 2 root root 4096 Jan 9 00:01 index
/tmp/loki # cd chunks/
/tmp/loki/chunks # ls -l | wc -l
5286025
/tmp/loki/chunks # dd if=/dev/zero of=write_test count=1024 bs=1048576
1024+0 records in
1024+0 records out
/tmp/loki/chunks # ls -l write_test
-rw-r--r-- 1 root root 1073741824 Jan 11 19:27 write_test
/tmp/loki/chunks # rm write_test
/tmp/loki/chunks # dd if=/dev/urandom of=write_test count=1024 bs=1048576
1024+0 records in
1024+0 records out
/tmp/loki/chunks # ls -l write_test
-rw-r--r-- 1 root root 1073741824 Jan 11 19:28 write_test
/tmp/loki/chunks # rm write_test
I haven't been able to find any disk limitation in the Docker container, and the fact that I can still manually write files to the volume inside the container makes me suspect the bug is in the loki code, but I could definitely be wrong!
To Reproduce Steps to reproduce the behavior:
- Run Loki (1.2.0, commit ccef3da2b61324bc0f8ae9e7ec6456110cf1ae05) Docker image with Docker 18.09.6
- ???
Expected behavior
Loki continues to successfully write chunks to /tmp/loki
while disk space and inodes are available.
Environment:
- Infrastructure: Docker 18.09.6, Debian 9.9, kernel 4.9.0-9-amd64
- Deployment tool: Ansible (using default Loki config file in Docker image at
/etc/loki/local-config.yaml
)
sorry, nothing obvious sticks out to me here. The 5 million chunks is certainly a lot...
Loki is just using Go's ioutil.WriteFile
so maybe this uses different syscalls than dd
does which is why one works and the other doesn't?
https://github.com/cortexproject/cortex/blob/ff6fc0a47f6716fdd23188faa729f42c04d26565/pkg/chunk/local/fs_object_client.go#L58
You could maybe try asking in the Go slack or github?
Edited typos
Other things I can think of:
chunk names are pretty long, what happens if you try to create a file in that directory with a really long name (instead of write_test), it's hard for me to pin down details on this but there is a size associated with file names and this I think has a limit as well, so too many long file names might be causing this.
I have no idea how inodes work in relation to the host volume and from within docker, it seems like your dd test would indicated there are enough inodes but may be worth checking in and outside the container?
What filesystem are you using?
Looks like you can do an ls
on the parent directory to see the size of it (which includes the size of the file names)
~/loki $ ls -alh
total 20M
drwxr-xr-x 4 pi pi 4.0K Sep 9 19:59 .
drwxr-xr-x 11 pi pi 4.0K Jan 6 22:35 ..
drwxr-xr-x 2 root root 20M Jan 13 20:22 chunks
drwxr-xr-x 2 root root 4.0K Jan 8 19:00 index
On one of my raspberry pi test loki's it has a couple hundred thousand chunks in the directory and that corresponds to a directory size of 20M
Thanks for taking a look. Filesystem is ext4 mounted as a Docker volume.
/tmp/loki/chunks
is definitely large: 606M directory, 5.5 million chunk files, 227 million file system blocks.
/tmp/loki # ls -lh
total 620440
drwxr-xr-x 2 root root 605.9M Jan 14 06:25 chunks
drwxr-xr-x 2 root root 4.0K Jan 9 00:01 index
/tmp/loki # find chunks/ | wc -l
5497294
/tmp/loki # ls -l chunks/ | head -n 1
total 226881340
It looks like all of the "failed to flush user" errors refer to three files, none of which actually exist:
sudo docker logs loki 2>&1 | grep "failed to flush user" | grep -o "open [^:]*" | awk '{print $2}' | sort | uniq -c
48 /tmp/loki/chunks/ZmFrZS83MzU5OTJkNzAzOWM0MDU5OjE2ZmEyYmJmZTIxOjE2ZmEyYmRkMzYzOmNjMjZlNTA1
64 /tmp/loki/chunks/ZmFrZS83NWQ5NjliMjUzNDhlYjM1OjE2ZmEyYjVjMzIwOjE2ZmEyYjVkNjU0OjM5YzY3OWM=
58 /tmp/loki/chunks/ZmFrZS9jOWNjNDUxMWZmZjUxODg2OjE2ZmEyYjkxZWRiOjE2ZmEyYjkxZWRjOjYzZTBhZGM1
These are different chunk names than I originally reported, so these might change over time. I also restarted the container since the original report.
I'll continue to monitor and report back if the invalid chunks change. Can also provide an strace of the error.
Files with those filenames in the above errors can't be created manually, but filenames with very similar names in the same directory can be created.
/tmp/loki # touch /tmp/loki/chunks/ZmFrZS83MzU5OTJkNzAzOWM0MDU5OjE2ZmEyYmJmZTIxOjE2ZmEyYmRkMzYzOmNjMjZlNTA1
touch: /tmp/loki/chunks/ZmFrZS83MzU5OTJkNzAzOWM0MDU5OjE2ZmEyYmJmZTIxOjE2ZmEyYmRkMzYzOmNjMjZlNTA1: No space left on device
/tmp/loki # touch /tmp/loki/chunks/ZmFrZS83MzU5OTJkNzAzOWM0MDU5OjE2ZmEyYmJmZTIxOjE2ZmEyYmRkMzYzOmNjMjZlNTA1-2
/tmp/loki # touch /tmp/loki/chunks/ZmFrZS83MzU5OTJkNzAzOWM0MDU5OjE2ZmEyYmJmZTIxOjE2ZmEyYmRkMzYzOmNjMjZlNTA2
/tmp/loki # ls -l /tmp/loki/chunks/ZmFrZS83MzU5OTJkNzAzOWM0MDU5OjE2ZmEyYmJmZTIxOjE2ZmEyYmRkMzYzOmNjMjZlNTA*
-rw-r--r-- 1 root root 0 Jan 14 07:05 /tmp/loki/chunks/ZmFrZS83MzU5OTJkNzAzOWM0MDU5OjE2ZmEyYmJmZTIxOjE2ZmEyYmRkMzYzOmNjMjZlNTA1-2
-rw-r--r-- 1 root root 0 Jan 14 07:05 /tmp/loki/chunks/ZmFrZS83MzU5OTJkNzAzOWM0MDU5OjE2ZmEyYmJmZTIxOjE2ZmEyYmRkMzYzOmNjMjZlNTA2
/tmp/loki # strace touch /tmp/loki/chunks/ZmFrZS83MzU5OTJkNzAzOWM0MDU5OjE2ZmEyYmJmZTIxOjE2ZmEyYmRkMzYzOmNjMjZlNTA1
execve("/bin/touch", ["touch", "/tmp/loki/chunks/ZmFrZS83MzU5OTJ"...], 0x7ffeb5249058 /* 7 vars */) = 0
arch_prctl(ARCH_SET_FS, 0x7f51a4152b68) = 0
set_tid_address(0x7f51a4152ba8) = 139
mprotect(0x7f51a414f000, 4096, PROT_READ) = 0
mprotect(0x556cd17d7000, 16384, PROT_READ) = 0
getuid() = 0
utimensat(AT_FDCWD, "/tmp/loki/chunks/ZmFrZS83MzU5OTJkNzAzOWM0MDU5OjE2ZmEyYmJmZTIxOjE2ZmEyYmRkMzYzOmNjMjZlNTA1", NULL, 0) = -1 ENOENT (No such file or directory)
open("/tmp/loki/chunks/ZmFrZS83MzU5OTJkNzAzOWM0MDU5OjE2ZmEyYmJmZTIxOjE2ZmEyYmRkMzYzOmNjMjZlNTA1", O_RDWR|O_CREAT, 0666) = -1 ENOSPC (No space left on device)
write(2, "touch: /tmp/loki/chunks/ZmFrZS83"..., 122touch: /tmp/loki/chunks/ZmFrZS83MzU5OTJkNzAzOWM0MDU5OjE2ZmEyYmJmZTIxOjE2ZmEyYmRkMzYzOmNjMjZlNTA1: No space left on device
) = 122
exit_group(1) = ?
+++ exited with 1 +++
Ok, found it. I was also unable to create that file inside the directory mounted to /tmp/loki
from the host (outside of the container). Finally looked in dmesg and found lots of:
[624434.242593] EXT4-fs warning (device sda1): ext4_dx_add_entry:2236: inode #58458946: comm loki: Directory index full!
I'll investigate if using a filesystem other than ext4 would allow for more files in the chunks
directory, but this might show a need for loki to organize chunk files into subdirectories.
Disabled dir_index
on the ext4 volume using
sudo tune2fs -O "^dir_index" /dev/sda1
That caused I/O errors and a read-only file system, and after rebooting dropped into an initramfs prompt with a corrupt volume. After running fsck.ext4 -y
on the volume, the system booted successfully and files which couldn't be created before seem to be able to be created now. I'll let it run and see if there are any more errors.
This is really great work @shane-axiom ! Sorry you had a scare there on getting your data.
I found this blog which you may have seen, that talks about the problem being a hash collision, this would make sense why only some names fail to write but not others.
The disappointing information seems to be there isn't much that you can do aside from just disabling the dir_index
as you did. Even using a different hash algorithm or maybe a longer hash ends up getting truncated.
It wasn't clear to me if the 64bit
feature might change this, out of curiosity if you run tune2fs -l
do you see 64bit
in the output?
It wasn't obvious from me in any docs if this would have any affect on the b-tree file hashing, or if it's just related to the max files which can be stored.
I'm also curious if disabling dir_index
will have a performance impact on Loki.
There are plans in the works to overhaul how local filesystem storage works, we want to find a way to combine chunks into larger files to reduce the file count and help improve performance. I'm afraid this work is quite a few months from being done though.
In the meantime, you could try the new flag #1406 which lets you cut bigger chunks (although this will use more memory) FYI this is not released yet you would need to use a master-xxxx image or latest
or master
tags.
Or increase the chunk_idle_period
if you have some slowly writing log streams (again uses more memory but cuts less chunks).
Or reduce the number of labels or cardinality on labels to reduce number of streams.
@slim-bean Yep, looks like 64bit
is on. Here's the whole tunefs output:
$ sudo tune2fs -l /dev/sda1
tune2fs 1.43.4 (31-Jan-2017)
Filesystem volume name: <none>
Last mounted on: /
Filesystem UUID: acfb753d-0109-4646-88d4-90ae17ff5978
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isiz
e metadata_csum
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 60981248
Block count: 243924480
Reserved block count: 12196224
Free blocks: 180478308
Free inodes: 55291940
First block: 0
Block size: 4096
Fragment size: 4096
Group descriptor size: 64
Reserved GDT blocks: 1024
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 512
Flex block group size: 16
Filesystem created: Fri Jan 4 15:10:28 2019
Last mount time: Mon Jan 13 23:43:57 2020
Last write time: Mon Jan 13 23:43:57 2020
Mount count: 1
Maximum mount count: -1
Last checked: Mon Jan 13 23:43:20 2020
Check interval: 0 (<none>)
Lifetime writes: 980 GB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 32
Desired extra isize: 32
Journal inode: 8
Default directory hash: half_md4
Directory Hash Seed: 8941b98b-4c9e-4b8d-a84b-ba72b466191d
Journal backup: inode blocks
Checksum type: crc32c
Checksum: 0x4bdf5115
This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.
Thank you stale bot.
@slim-bean Disabling dir_index
while using loki 1.2.0 seemed to solve this issue for us.
I'm also curious if disabling dir_index will have a performance impact on Loki.
We don't have any performance metrics running for loki other than loki-canary, but it seems fine after this change, and subjectively I haven't noticed any significant difference.
Just upgraded to loki 1.3.0 and haven't tried target-chunk-size
or chunk_idle_period
yet, so I can't comment there.
Before this gets closed, do you think it's worth distilling this dir_index
config tweak into documentation somewhere?
Thanks for the update @shane-axiom yes this should definitely make it into the docs somewhere I haven't looked to see where.
If you have a chance to add something that would be awesome, else I will try to add something too, for now we'll just kick stale bot and see where we end up in another 30 days :)
This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.
hey everybody: This is still happening, we just ran into this. Having it closed by the stale bot it a bit disappointing, can this be reopened?
As a quick fix, we'll probably just migrate to a more battle-tested chunk store backend (S3), but having issues like this in the fs backend is still annoying, as that's what most people probably start with.
Before this gets closed, do you think it's worth distilling this
dir_index
config tweak into documentation somewhere?
Even closed, this issue is still relevant. I had several issues with dir_index
, using minio and local storage.
Same here; still very much an issue with latest Loki; it's a very fresh installation with pulling logs from maybe 20-30 hosts and already hitting >3M chunks in a single directory.
It's just not feasible with a local filesystem; is it not typical to hash files in subfolder structure?
@weakcamel it seems that the current posture on this issue is: https://github.com/grafana/loki/issues/3324#issuecomment-804839255
I think there is also a lack of documentation about this, lets wait and see if it's fixed.
Thanks @theonlydoo!
Interesting to see that the discussion thread points to Cortex which does not seem to support filesystem storage at all: https://cortexmetrics.io/docs/chunks-storage/
Seems that at this point it's better to look at alternative storage options.
The disappointing information seems to be there isn't much that you can do aside from just disabling the
dir_index
as you did. Even using a different hash algorithm or maybe a longer hash ends up getting truncated.
You can enable the large_dir
feature which is better than disabling dir_index
(which speed up name lookups).
The disappointing information seems to be there isn't much that you can do aside from just disabling the
dir_index
as you did. Even using a different hash algorithm or maybe a longer hash ends up getting truncated.You can enable the
large_dir
feature which is better than disablingdir_index
(which speed up name lookups).
# tune2fs -O large_dir /dev/nvme0n1
tune2fs 1.42.9 (28-Dec-2013)
Setting filesystem feature ‘large_dir’ not supported.
# tune2fs -O large_dir /dev/nvme0n1 tune2fs 1.42.9 (28-Dec-2013) Setting filesystem feature ‘large_dir’ not supported.
You need to be running kernel 4.13 or newer.
Thanks for the tip. I will try it out when I got a chance!
On Apr 19, 2021, at 2:59 PM, Kristian Klausen @.***> wrote:
tune2fs -O large_dir /dev/nvme0n1
tune2fs 1.42.9 (28-Dec-2013) Setting filesystem feature ‘large_dir’ not supported.
You need to be running kernel 4.13 or newerhttps://github.com/torvalds/linux/commit/e08ac99fa2a25626f573cfa377ef3ddedf2cfe8f.
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/grafana/loki/issues/1502#issuecomment-822814264, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABETAJ7KJK42A22GGQJAUS3TJSRUNANCNFSM4KFTTEOQ.
Is this problem solved?
nope
Is there any fix for that? We have this Problem on AKS Kubernetes Plattform with not much data in Loki (i guess 30 GB or something..)
using s3 like storage seems to be a good workaround
encountering the same issue with Loki and EFS at arround 20gigs of capacity on EKS
Had the same issue with 3.7 today, enabling large_dir
as you guys suggested solved the problem immediately. Thanks!
If anyone's using NetApp ONTAP volumes, the variable that controls the size of the directory index is maxdirsize
. If you create a lot of files like Loki does, you can hit that limit. This article describes how to increase that: https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Software/ONTAP_OS/What_is_maxdirsize
The article also mentions some of the tradeoffs of raising the size of the directory index, and some of that advice probably applies to ext4 as well.