Describe the bug

When running the Loki 1.2.0 Docker image, Loki is reporting that it can't write chunks to disk because there is "no space left on device", although there appears to be plenty of space.

level=error ts=2020-01-11T19:13:11.822567024Z caller=flush.go:178 org_id=fake msg="failed to flush user" err="open /tmp/loki/chunks/ZmFrZS84NDBiODY0MTMwOWFkOTZlOjE2Zjk1ZWNjNmU1OjE2Zjk1ZWNkM2JjOmRkMWUwMjUx: no space left on device"
level=error ts=2020-01-11T19:13:11.851323284Z caller=flush.go:178 org_id=fake msg="failed to flush user" err="open /tmp/loki/chunks/ZmFrZS82ZDNlZmFhODk1OWZiYjQxOjE2Zjk1ZTgzOTI4OjE2Zjk1ZmMyNzRiOjg3MTQ1OTkw: no space left on device"

Plenty of space and inodes available on disk where /tmp/loki volume lives:

$ df -h /dev/sda1
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       915G  223G  646G  26% /

$ df -i /dev/sda1
Filesystem       Inodes   IUsed    IFree IUse% Mounted on
/dev/sda1      60981248 5473071 55508177    9% /

/tmp/loki named volume mount from docker inspect

        "Mounts": [
            {
                "Type": "volume",
                "Name": "loki",
                "Source": "/var/lib/docker/volumes/loki/_data",
                "Destination": "/tmp/loki",
                "Driver": "local",
                "Mode": "rw",
                "RW": true,
                "Propagation": ""
            }

$ docker volume inspect loki
[
    {
        "CreatedAt": "2020-01-11T10:37:39-08:00",
        "Driver": "local",
        "Labels": null,
        "Mountpoint": "/var/lib/docker/volumes/loki/_data",
        "Name": "loki",
        "Options": null,
        "Scope": "local"
    }
]

Execing into the loki container and doing manual write tests to verify that files can be written

$ docker exec -it loki sh
/ # cd /tmp/loki
/tmp/loki # ls -l
total 596644
drwxr-xr-x    2 root     root     610926592 Jan 11 19:24 chunks
drwxr-xr-x    2 root     root          4096 Jan  9 00:01 index
/tmp/loki # cd chunks/
/tmp/loki/chunks # ls -l | wc -l
5286025
/tmp/loki/chunks # dd if=/dev/zero of=write_test count=1024 bs=1048576
1024+0 records in
1024+0 records out
/tmp/loki/chunks # ls -l write_test
-rw-r--r--    1 root     root     1073741824 Jan 11 19:27 write_test
/tmp/loki/chunks # rm write_test
/tmp/loki/chunks # dd if=/dev/urandom of=write_test count=1024 bs=1048576
1024+0 records in
1024+0 records out
/tmp/loki/chunks # ls -l write_test
-rw-r--r--    1 root     root     1073741824 Jan 11 19:28 write_test
/tmp/loki/chunks # rm write_test

I haven't been able to find any disk limitation in the Docker container, and the fact that I can still manually write files to the volume inside the container makes me suspect the bug is in the loki code, but I could definitely be wrong!

To Reproduce Steps to reproduce the behavior:

Run Loki (1.2.0, commit ccef3da2b61324bc0f8ae9e7ec6456110cf1ae05) Docker image with Docker 18.09.6
???

Expected behavior Loki continues to successfully write chunks to /tmp/loki while disk space and inodes are available.

Environment:

Infrastructure: Docker 18.09.6, Debian 9.9, kernel 4.9.0-9-amd64
Deployment tool: Ansible (using default Loki config file in Docker image at /etc/loki/local-config.yaml)

Jan 11 '20 19:01 srstsavage

sorry, nothing obvious sticks out to me here. The 5 million chunks is certainly a lot...

Loki is just using Go's ioutil.WriteFile so maybe this uses different syscalls than dd does which is why one works and the other doesn't?

https://github.com/cortexproject/cortex/blob/ff6fc0a47f6716fdd23188faa729f42c04d26565/pkg/chunk/local/fs_object_client.go#L58

You could maybe try asking in the Go slack or github?

Edited typos

Jan 14 '20 01:01 slim-bean

Other things I can think of:

chunk names are pretty long, what happens if you try to create a file in that directory with a really long name (instead of write_test), it's hard for me to pin down details on this but there is a size associated with file names and this I think has a limit as well, so too many long file names might be causing this.

I have no idea how inodes work in relation to the host volume and from within docker, it seems like your dd test would indicated there are enough inodes but may be worth checking in and outside the container?

What filesystem are you using?

Jan 14 '20 01:01 slim-bean

Looks like you can do an ls on the parent directory to see the size of it (which includes the size of the file names)

~/loki $ ls -alh
total 20M
drwxr-xr-x  4 pi   pi   4.0K Sep  9 19:59 .
drwxr-xr-x 11 pi   pi   4.0K Jan  6 22:35 ..
drwxr-xr-x  2 root root  20M Jan 13 20:22 chunks
drwxr-xr-x  2 root root 4.0K Jan  8 19:00 index

On one of my raspberry pi test loki's it has a couple hundred thousand chunks in the directory and that corresponds to a directory size of 20M

Jan 14 '20 01:01 slim-bean

Thanks for taking a look. Filesystem is ext4 mounted as a Docker volume.

/tmp/loki/chunks is definitely large: 606M directory, 5.5 million chunk files, 227 million file system blocks.

/tmp/loki # ls -lh
total 620440
drwxr-xr-x    2 root     root      605.9M Jan 14 06:25 chunks
drwxr-xr-x    2 root     root        4.0K Jan  9 00:01 index
/tmp/loki # find chunks/ | wc -l
5497294
/tmp/loki # ls -l chunks/ | head -n 1
total 226881340

Jan 14 '20 06:01 srstsavage

It looks like all of the "failed to flush user" errors refer to three files, none of which actually exist:

sudo docker logs loki 2>&1 | grep "failed to flush user" | grep -o "open [^:]*" | awk '{print $2}' | sort | uniq -c
     48 /tmp/loki/chunks/ZmFrZS83MzU5OTJkNzAzOWM0MDU5OjE2ZmEyYmJmZTIxOjE2ZmEyYmRkMzYzOmNjMjZlNTA1
     64 /tmp/loki/chunks/ZmFrZS83NWQ5NjliMjUzNDhlYjM1OjE2ZmEyYjVjMzIwOjE2ZmEyYjVkNjU0OjM5YzY3OWM=
     58 /tmp/loki/chunks/ZmFrZS9jOWNjNDUxMWZmZjUxODg2OjE2ZmEyYjkxZWRiOjE2ZmEyYjkxZWRjOjYzZTBhZGM1

These are different chunk names than I originally reported, so these might change over time. I also restarted the container since the original report.

I'll continue to monitor and report back if the invalid chunks change. Can also provide an strace of the error.

Jan 14 '20 07:01 srstsavage

Files with those filenames in the above errors can't be created manually, but filenames with very similar names in the same directory can be created.

/tmp/loki # touch /tmp/loki/chunks/ZmFrZS83MzU5OTJkNzAzOWM0MDU5OjE2ZmEyYmJmZTIxOjE2ZmEyYmRkMzYzOmNjMjZlNTA1
touch: /tmp/loki/chunks/ZmFrZS83MzU5OTJkNzAzOWM0MDU5OjE2ZmEyYmJmZTIxOjE2ZmEyYmRkMzYzOmNjMjZlNTA1: No space left on device
/tmp/loki # touch /tmp/loki/chunks/ZmFrZS83MzU5OTJkNzAzOWM0MDU5OjE2ZmEyYmJmZTIxOjE2ZmEyYmRkMzYzOmNjMjZlNTA1-2
/tmp/loki # touch /tmp/loki/chunks/ZmFrZS83MzU5OTJkNzAzOWM0MDU5OjE2ZmEyYmJmZTIxOjE2ZmEyYmRkMzYzOmNjMjZlNTA2
/tmp/loki # ls -l /tmp/loki/chunks/ZmFrZS83MzU5OTJkNzAzOWM0MDU5OjE2ZmEyYmJmZTIxOjE2ZmEyYmRkMzYzOmNjMjZlNTA*
-rw-r--r--    1 root     root             0 Jan 14 07:05 /tmp/loki/chunks/ZmFrZS83MzU5OTJkNzAzOWM0MDU5OjE2ZmEyYmJmZTIxOjE2ZmEyYmRkMzYzOmNjMjZlNTA1-2
-rw-r--r--    1 root     root             0 Jan 14 07:05 /tmp/loki/chunks/ZmFrZS83MzU5OTJkNzAzOWM0MDU5OjE2ZmEyYmJmZTIxOjE2ZmEyYmRkMzYzOmNjMjZlNTA2

/tmp/loki # strace touch /tmp/loki/chunks/ZmFrZS83MzU5OTJkNzAzOWM0MDU5OjE2ZmEyYmJmZTIxOjE2ZmEyYmRkMzYzOmNjMjZlNTA1
execve("/bin/touch", ["touch", "/tmp/loki/chunks/ZmFrZS83MzU5OTJ"...], 0x7ffeb5249058 /* 7 vars */) = 0
arch_prctl(ARCH_SET_FS, 0x7f51a4152b68) = 0
set_tid_address(0x7f51a4152ba8)         = 139
mprotect(0x7f51a414f000, 4096, PROT_READ) = 0
mprotect(0x556cd17d7000, 16384, PROT_READ) = 0
getuid()                                = 0
utimensat(AT_FDCWD, "/tmp/loki/chunks/ZmFrZS83MzU5OTJkNzAzOWM0MDU5OjE2ZmEyYmJmZTIxOjE2ZmEyYmRkMzYzOmNjMjZlNTA1", NULL, 0) = -1 ENOENT (No such file or directory)
open("/tmp/loki/chunks/ZmFrZS83MzU5OTJkNzAzOWM0MDU5OjE2ZmEyYmJmZTIxOjE2ZmEyYmRkMzYzOmNjMjZlNTA1", O_RDWR|O_CREAT, 0666) = -1 ENOSPC (No space left on device)
write(2, "touch: /tmp/loki/chunks/ZmFrZS83"..., 122touch: /tmp/loki/chunks/ZmFrZS83MzU5OTJkNzAzOWM0MDU5OjE2ZmEyYmJmZTIxOjE2ZmEyYmRkMzYzOmNjMjZlNTA1: No space left on device
) = 122
exit_group(1)                           = ?
+++ exited with 1 +++

Jan 14 '20 07:01 srstsavage

Ok, found it. I was also unable to create that file inside the directory mounted to /tmp/loki from the host (outside of the container). Finally looked in dmesg and found lots of:

[624434.242593] EXT4-fs warning (device sda1): ext4_dx_add_entry:2236: inode #58458946: comm loki: Directory index full!

I'll investigate if using a filesystem other than ext4 would allow for more files in the chunks directory, but this might show a need for loki to organize chunk files into subdirectories.

Jan 14 '20 07:01 srstsavage

Disabled dir_index on the ext4 volume using

sudo tune2fs -O "^dir_index" /dev/sda1

That caused I/O errors and a read-only file system, and after rebooting dropped into an initramfs prompt with a corrupt volume. After running fsck.ext4 -y on the volume, the system booted successfully and files which couldn't be created before seem to be able to be created now. I'll let it run and see if there are any more errors.

Jan 14 '20 07:01 srstsavage

This is really great work @shane-axiom ! Sorry you had a scare there on getting your data.

I found this blog which you may have seen, that talks about the problem being a hash collision, this would make sense why only some names fail to write but not others.

The disappointing information seems to be there isn't much that you can do aside from just disabling the dir_index as you did. Even using a different hash algorithm or maybe a longer hash ends up getting truncated.

It wasn't clear to me if the 64bit feature might change this, out of curiosity if you run tune2fs -l do you see 64bit in the output?

It wasn't obvious from me in any docs if this would have any affect on the b-tree file hashing, or if it's just related to the max files which can be stored.

I'm also curious if disabling dir_index will have a performance impact on Loki.

There are plans in the works to overhaul how local filesystem storage works, we want to find a way to combine chunks into larger files to reduce the file count and help improve performance. I'm afraid this work is quite a few months from being done though.

In the meantime, you could try the new flag #1406 which lets you cut bigger chunks (although this will use more memory) FYI this is not released yet you would need to use a master-xxxx image or latest or master tags.

Or increase the chunk_idle_period if you have some slowly writing log streams (again uses more memory but cuts less chunks).

Or reduce the number of labels or cardinality on labels to reduce number of streams.

Jan 14 '20 14:01 slim-bean

@slim-bean Yep, looks like 64bit is on. Here's the whole tunefs output:

$ sudo tune2fs -l /dev/sda1
tune2fs 1.43.4 (31-Jan-2017) 
Filesystem volume name:   <none>
Last mounted on:          /                       
Filesystem UUID:          acfb753d-0109-4646-88d4-90ae17ff5978
Filesystem magic number:  0xEF53                  
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isiz
e metadata_csum                                   
Filesystem flags:         signed_directory_hash
Default mount options:    user_xattr acl
Filesystem state:         clean        
Errors behavior:          Continue      
Filesystem OS type:       Linux
Inode count:              60981248
Block count:              243924480
Reserved block count:     12196224
Free blocks:              180478308
Free inodes:              55291940
First block:              0                                   
Block size:               4096        
Fragment size:            4096  
Group descriptor size:    64        
Reserved GDT blocks:      1024  
Blocks per group:         32768     
Fragments per group:      32768                 
Inodes per group:         8192           
Inode blocks per group:   512
Flex block group size:    16    
Filesystem created:       Fri Jan  4 15:10:28 2019
Last mount time:          Mon Jan 13 23:43:57 2020            
Last write time:          Mon Jan 13 23:43:57 2020
Mount count:              1          
Maximum mount count:      -1                                                                                                                                   
Last checked:             Mon Jan 13 23:43:20 2020
Check interval:           0 (<none>)           
Lifetime writes:          980 GB        
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11   
Inode size:               256     
Required extra isize:     32       
Desired extra isize:      32      
Journal inode:            8        
Default directory hash:   half_md4
Directory Hash Seed:      8941b98b-4c9e-4b8d-a84b-ba72b466191d
Journal backup:           inode blocks
Checksum type:            crc32c
Checksum:                 0x4bdf5115

Jan 14 '20 19:01 srstsavage

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

Feb 13 '20 19:02 stale[bot]

Thank you stale bot.

@slim-bean Disabling dir_index while using loki 1.2.0 seemed to solve this issue for us.

I'm also curious if disabling dir_index will have a performance impact on Loki.

We don't have any performance metrics running for loki other than loki-canary, but it seems fine after this change, and subjectively I haven't noticed any significant difference.

Just upgraded to loki 1.3.0 and haven't tried target-chunk-size or chunk_idle_period yet, so I can't comment there.

Before this gets closed, do you think it's worth distilling this dir_index config tweak into documentation somewhere?

Feb 13 '20 19:02 srstsavage

Thanks for the update @shane-axiom yes this should definitely make it into the docs somewhere I haven't looked to see where.

If you have a chance to add something that would be awesome, else I will try to add something too, for now we'll just kick stale bot and see where we end up in another 30 days :)

Feb 14 '20 01:02 slim-bean

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

Mar 15 '20 02:03 stale[bot]

hey everybody: This is still happening, we just ran into this. Having it closed by the stale bot it a bit disappointing, can this be reopened?

As a quick fix, we'll probably just migrate to a more battle-tested chunk store backend (S3), but having issues like this in the fs backend is still annoying, as that's what most people probably start with.

Jan 22 '21 10:01 jcgruenhage

Before this gets closed, do you think it's worth distilling this dir_index config tweak into documentation somewhere?

Even closed, this issue is still relevant. I had several issues with dir_index, using minio and local storage.

Feb 11 '21 10:02 Doooooo0o

Same here; still very much an issue with latest Loki; it's a very fresh installation with pulling logs from maybe 20-30 hosts and already hitting >3M chunks in a single directory.

It's just not feasible with a local filesystem; is it not typical to hash files in subfolder structure?

Mar 25 '21 11:03 weakcamel

@weakcamel it seems that the current posture on this issue is: https://github.com/grafana/loki/issues/3324#issuecomment-804839255

I think there is also a lack of documentation about this, lets wait and see if it's fixed.

Mar 25 '21 11:03 Doooooo0o

Thanks @theonlydoo!

Interesting to see that the discussion thread points to Cortex which does not seem to support filesystem storage at all: https://cortexmetrics.io/docs/chunks-storage/

Seems that at this point it's better to look at alternative storage options.

Mar 25 '21 12:03 weakcamel

The disappointing information seems to be there isn't much that you can do aside from just disabling the dir_index as you did. Even using a different hash algorithm or maybe a longer hash ends up getting truncated.

You can enable the large_dir feature which is better than disabling dir_index (which speed up name lookups).

Mar 31 '21 14:03 klausenbusk

The disappointing information seems to be there isn't much that you can do aside from just disabling the dir_index as you did. Even using a different hash algorithm or maybe a longer hash ends up getting truncated.

You can enable the large_dir feature which is better than disabling dir_index (which speed up name lookups).

# tune2fs -O large_dir /dev/nvme0n1
tune2fs 1.42.9 (28-Dec-2013)
Setting filesystem feature ‘large_dir’ not supported.

Apr 19 '21 20:04 duhang

# tune2fs -O large_dir /dev/nvme0n1
tune2fs 1.42.9 (28-Dec-2013)
Setting filesystem feature ‘large_dir’ not supported.

You need to be running kernel 4.13 or newer.

Apr 19 '21 21:04 klausenbusk

Thanks for the tip. I will try it out when I got a chance!

On Apr 19, 2021, at 2:59 PM, Kristian Klausen @.***> wrote:

tune2fs -O large_dir /dev/nvme0n1

tune2fs 1.42.9 (28-Dec-2013) Setting filesystem feature ‘large_dir’ not supported.

You need to be running kernel 4.13 or newerhttps://github.com/torvalds/linux/commit/e08ac99fa2a25626f573cfa377ef3ddedf2cfe8f.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/grafana/loki/issues/1502#issuecomment-822814264, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABETAJ7KJK42A22GGQJAUS3TJSRUNANCNFSM4KFTTEOQ.

Apr 19 '21 22:04 duhang

Is this problem solved?

Jul 30 '21 02:07 glyslxq

nope

Jul 30 '21 08:07 Doooooo0o

Is there any fix for that? We have this Problem on AKS Kubernetes Plattform with not much data in Loki (i guess 30 GB or something..)

Aug 31 '21 08:08 dpunkturban

using s3 like storage seems to be a good workaround

Aug 31 '21 08:08 Doooooo0o

encountering the same issue with Loki and EFS at arround 20gigs of capacity on EKS

Jan 10 '22 12:01 qubusp

Had the same issue with 3.7 today, enabling large_dir as you guys suggested solved the problem immediately. Thanks!

Apr 13 '22 06:04 aleksanderlech

If anyone's using NetApp ONTAP volumes, the variable that controls the size of the directory index is maxdirsize. If you create a lot of files like Loki does, you can hit that limit. This article describes how to increase that: https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Software/ONTAP_OS/What_is_maxdirsize

The article also mentions some of the tradeoffs of raising the size of the directory index, and some of that advice probably applies to ext4 as well.

Apr 21 '22 15:04 mac-chaffee

loki
loki copied to clipboard

Loki (in Docker) reports "no space left on device" but there's plenty of space/inodes

tune2fs -O large_dir /dev/nvme0n1

loki loki copied to clipboard

Loki (in Docker) reports "no space left on device" but there's plenty of space/inodes

tune2fs -O large_dir /dev/nvme0n1

loki
loki copied to clipboard