routinator
routinator copied to clipboard
Fatal Error: failed to fill whole buffer
Hi,
I've been running Routinator and have noticed lately a constant crash that is occurring across my edges.
Aug 17 15:53:14 edge-dal01 routinator[311637]: Failed to read stored publication point at /var/lib/routinator/rpki-cache/stored/rrdp/rpki-repo.registro.br/026b1847a231e62a6bcf46a3794665f4dcbe0e609f90c0e54d7062dbf72114d9/rsync/rpki-repo.registro.br/repo/7bT49NANhaBtN8pntdNKGXgkSmJu1Y2LtvQMooH1o5Rv/0/E9213DD4021CE06AB92F4683859FB848C0A265FD.mft: failed to fill whole buffer
Aug 17 15:53:14 edge-dal01 routinator[311637]: Fatal error. Exiting.
Aug 17 15:53:14 edge-dal01 systemd[1]: routinator.service: Main process exited, code=exited, status=1/FAILURE
Aug 17 15:53:14 edge-dal01 systemd[1]: routinator.service: Failed with result 'exit-code'.
Aug 17 15:53:14 edge-dal01 systemd[1]: routinator.service: Consumed 1min 10.636s CPU time.
Aug 17 15:53:15 edge-dal01 systemd[1]: routinator.service: Scheduled restart job, restart counter is at 11166.
Aug 17 15:53:15 edge-dal01 systemd[1]: Stopped Routinator 3000.
Aug 17 15:53:15 edge-dal01 systemd[1]: routinator.service: Consumed 1min 10.636s CPU time.
Aug 17 15:53:15 edge-dal01 systemd[1]: Starting Routinator 3000...
Aug 17 15:53:15 edge-dal01 systemd[1]: Started Routinator 3000.
Aug 18 14:29:38 edge-dal01 routinator[330802]: Failed to read stored publication point at /var/lib/routinator/rpki-cache/stored/rrdp/rpki-repo.registro.br/026b1847a231e62a6bcf46a3794665f4dcbe0e609f90c0e54d7062dbf72114d9/rsync/rpki-repo.registro.br/repo/7bT49NANhaBtN8pntdNKGXgkSmJu1Y2LtvQMooH1o5Rv/0/E9213DD4021CE06AB92F4683859FB848C0A265FD.mft: failed to fill whole buffer
Aug 18 14:29:39 edge-dal01 routinator[330802]: Fatal error. Exiting.
Aug 18 14:29:39 edge-dal01 systemd[1]: routinator.service: Main process exited, code=exited, status=1/FAILURE
Aug 18 14:29:39 edge-dal01 systemd[1]: routinator.service: Failed with result 'exit-code'.
Aug 18 14:29:39 edge-dal01 systemd[1]: routinator.service: Consumed 1min 20.631s CPU time.
Aug 18 14:29:39 edge-dal01 systemd[1]: routinator.service: Scheduled restart job, restart counter is at 11790.
Aug 18 14:29:39 edge-dal01 systemd[1]: Stopped Routinator 3000.
Aug 18 14:29:39 edge-dal01 systemd[1]: routinator.service: Consumed 1min 20.631s CPU time.
Aug 18 14:29:39 edge-dal01 systemd[1]: Starting Routinator 3000...
Aug 18 14:29:39 edge-dal01 systemd[1]: Started Routinator 3000.
root@edge-dal01:/var/log# routinator --version
Routinator 0.11.2
root@edge-dal01:/var/log# df -h
Filesystem Size Used Avail Use% Mounted on
tmpfs 198M 1.2M 197M 1% /run
/dev/vda1 47G 15G 30G 33% /
tmpfs 989M 184K 989M 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 198M 4.0K 198M 1% /run/user/0
root@edge-dal01:/var/log# df --inodes
Filesystem Inodes IUsed IFree IUse% Mounted on
tmpfs 253139 720 252419 1% /run
/dev/vda1 13004800 566939 12437861 5% /
tmpfs 253139 3 253136 1% /dev/shm
tmpfs 253139 3 253136 1% /run/lock
tmpfs 50627 26 50601 1% /run/user/0
The system is Ubuntu 22.04, installation is via the APT repo listed in the installation documentation.
Hi and thanks for the report!
The error message means that file mentioned doesn’t contain all the data expected. Quickest way to fix this by deleting the file or – better yet – the entire store, i.e., /var/lib/routinator/rpki-cache/stored
.
Before you do that, can you quickly check how big the file in question is?
root@edge-dal01:~# ls -alh /var/lib/routinator/rpki-cache/stored/rrdp/rpki-repo.registro.br/026b1847a231e62a6bcf46a3794665f4dcbe0e609f90c0e54d7062dbf72114d9/rsync/rpki-repo.registro.br/repo/7bT49NANhaBtN8pntdNKGXgkSmJu1Y2LtvQMooH1o5Rv/0/
total 8.0K
drwxr-xr-x 2 routinator routinator 4.0K Aug 2 15:48 .
drwxr-xr-x 3 routinator routinator 4.0K Jul 13 13:38 ..
-rw-r--r-- 1 routinator routinator 0 Aug 2 15:48 E9213DD4021CE06AB92F4683859FB848C0A265FD.mft
Right, so it is empty. I’ll double check if that can actually happen without Routinator also crashing.
In the mean time, just wiping the the cache should fix your issue.
Yes, wiping the cache appears to have fixed it. Its no longer crashing and has been stable for the past 3 hours. Just curious how it got that 0 byte file, since I imagine it can happen again.
So I figured out why this happened now that I looked further into the logs when this occurred. This appears to have been due to a unexpected shutdown of the host node that this VM was on.
Aug 2 15:48:23 edge-dal01 routinator[729]: rsync://rpki-repo.registro.br/repo/DhUX35ZMUCbRhYuECLQuQiyiqCX4z9LvRirJw3bBUwyX/0/8A2D68F59DA79C1167B96E60DB2A8AC7F897F91B.mft: failed to validate
Aug 2 15:48:23 edge-dal01 routinator[729]: rsync://rpki-repo.registro.br/repo/DhUX35ZMUCbRhYuECLQuQiyiqCX4z9LvRirJw3bBUwyX/0/8A2D68F59DA79C1167B96E60DB2A8AC7F897F91B.mft: No valid manifest found.
Aug 2 15:48:23 edge-dal01 routinator[729]: CA for rsync://rpki-repo.registro.br/repo/DhUX35ZMUCbRhYuECLQuQiyiqCX4z9LvRirJw3bBUwyX/0/ rejected, resources marked as unsafe:
Aug 2 15:48:23 edge-dal01 routinator[729]: 45.186.80.0/22
Aug 2 15:48:23 edge-dal01 routinator[729]: 2804:6440::/32
Aug 2 15:48:23 edge-dal01 routinator[729]: AS269416
Aug 2 15:48:23 edge-dal01 routAug 2 18:46:03 edge-dal01 systemd-modules-load[407]: Inserted module 'msr'
root@edge-dal01:~# zcat /var/log/syslog.2.gz | grep "Aug 2 18:46"
Aug 2 15:48:23 edge-dal01 routAug 2 18:46:03 edge-dal01 systemd-modules-load[407]: Inserted module 'msr'
Aug 2 18:46:03 edge-dal01 systemd-modules-load[407]: Inserted module 'ipmi_devintf'
Aug 2 18:46:03 edge-dal01 systemd[1]: Finished Load Kernel Modules.
Aug 2 18:46:03 edge-dal01 systemd[1]: Finished Remount Root and Kernel File Systems.
Aug 2 18:46:03 edge-dal01 systemd[1]: Activating swap /swapfile...
Aug 2 18:46:03 edge-dal01 systemd[1]: Mounting FUSE Control File System...
Aug 2 18:46:03 edge-dal01 systemd[1]: Mounting Kernel Configuration File System...
Aug 2 18:46:03 edge-dal01 systemd[1]: Starting Device-Mapper Multipath Device Controller...
Aug 2 18:46:03 edge-dal01 systemd[1]: Starting Flush Journal to Persistent Storage...
Aug 2 18:46:03 edge-dal01 systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.
I guess the unexpected shutdown occurred right when Routinator was just opening that file for writing. But hopefully a code change can occur to dump such invalid files instead of crashing.
That sounds like a reasonable change. I’ll add the issue to the milestone for the next release.
Again, thank you for reporting!