goat
goat copied to clipboard
panic: runtime error: index out of range
Hi @sevagh
First of all, really appreciate the work you've put into this tool - we use it a lot.
Recently though, I've randomly started encountering a problem that I've not seen before, specifically this error:
INFO[0000] Running goat for ebs
INFO[0000] WELCOME TO GOAT
INFO[0000] 1: COLLECTING EC2 INFO
INFO[0000] Establishing metadata client
INFO[0000] Retrieved metadata successfully instance_id=i-0a403f852293bd8e3
INFO[0000] Using metadata to initialize EC2 SDK client az=eu-west-1a instance_id=i-0a403f852293bd8e3 region=eu-west-1
INFO[0000] 2: COLLECTING EBS INFO
INFO[0000] Searching for EBS volumes
INFO[0000] Classifying EBS volumes based on tags
INFO[0000] 3: ATTACHING EBS VOLS
INFO[0000] 4: MOUNTING ATTACHED VOLS
panic: runtime error: index out of range
goroutine 1 [running]:
main.prepAndMountDrives(0xc0004388e0, 0x5, 0xab95d0, 0x0, 0x0)
/home/shanssian/go/src/github.com/sevagh/goat/ebs.go:42 +0x1350
main.GoatEbs(0x7d1b00, 0x7cd503, 0x7)
/home/shanssian/go/src/github.com/sevagh/goat/ebs.go:35 +0x256
main.main()
/home/shanssian/go/src/github.com/sevagh/goat/main.go:66 +0x4c
The EBS volumes attach fine, however, they fail to mount. This looked the same as #47 initially, however, I've upgraded to 1.0.2 and the error persists.
The tagging all appears to be correct, GOAT-IN:FsType, GOAT-IN:MountPath, GOAT-IN:NodeId, GOAT-IN:Prefix, & GOAT-IN:VolumeName are all set, and I've verified the IAM permissionns are correct.
Any ideas?
I'm glad the tool has been useful.
It could be that the disks are somehow already attached, in which case my code ignores them - I believe that's a corner case that I hadn't thought through, I could proceed to the next steps (Mount, etc.) even if the disks are already attached.
How do you create EC2 machines + EBS vols? Terraform?
Do you have an example of a log from a successful run stored somewhere? I believe the emptiness between steps 2-3 and 3-4 could indicate the "disks already attached therefore my code doesn't add them to the array of disks to mount" hypothesis.
Thanks for the speedy response.
I've just downgraded to 0.6.0 which seems to have got things working again.
That's certainly quite possible. Goat is running at boot, so it may well have failed to mount because of some other reason, after having already attached all the volumes, then died.
I'm then running goat ebs a second time, which is where I'm seeing the error above.
I'll try upgrading back to 1.0.2 shortly and running it with no volumes attached to see what happens there. Sounds like I'm potentially running into two different issues
Ok, with everything detached, I get a different problem
FATA[0020] Drive /dev/xvdl doesn't exist after attaching - checked with stat 10 times vol_id=vol-0c17eb2e0d145df15 vol_name=export
Interestingly, that volume took longer to attach than goat was willing to wait.
Can you test this hotfix? https://github.com/sevagh/goat/releases/tag/1.0.3-rc0
@sevagh That appears to have resolved the original issue 👍
I'm now getting this:
FATA[0000] Couldn't mount: no such device vol_name=archive vols="[{vol-06ca06283f3290a4e archive -1 1 /dev/xvdb /var/www/html ext4}]"
We're using nvme, which is probably why - it should be /dev/nvme1n1
I should be getting that attached device name from the AWS API: https://docs.aws.amazon.com/sdk-for-go/api/service/ec2/#VolumeAttachment
@sevagh Do you want me to raise a new issue for the above? then you can close this one off
This same issue is OK. I'll get back to you soon.
Can you print the logs surrounding the above error message? There are some additional functions related to nvme (which another user helped contribute) - I'm not very well versed in Nitro and nvme devices.
Sure, here you go:
INFO[0000] Running goat for ebs
INFO[0000] WELCOME TO GOAT
INFO[0000] 1: COLLECTING EC2 INFO
INFO[0000] Establishing metadata client
INFO[0000] Retrieved metadata successfully instance_id=i-000000000000
INFO[0000] Using metadata to initialize EC2 SDK client az=eu-west-1a instance_id=i-000000000000 region=eu-west-1
INFO[0000] 2: COLLECTING EBS INFO
INFO[0000] Searching for EBS volumes
INFO[0000] Classifying EBS volumes based on tags
INFO[0000] 3: ATTACHING EBS VOLS
INFO[0000] Volume is unattached, picking drive name vol_id=vol-000000000000000 vol_name=archive
INFO[0000] Executing AWS SDK attach command vol_id=vol-000000000000000 vol_name=archive
INFO[0000] {
AttachTime: 2019-08-11 10:52:23.463 +0000 UTC,
Device: "/dev/xvdf",
InstanceId: "i-000000000000",
State: "attaching",
VolumeId: "vol-000000000000000"
} vol_id=vol-000000000000000 vol_name=archive
INFO[0014] 4: MOUNTING ATTACHED VOLS
INFO[0014] Label already exists, jumping to mount phase vol_name=archive vols="[{vol-000000000000000 archive -1 1 /dev/xvdf /var/www/html ext4}]"
INFO[0014] Checking if something already mounted at %s/var/www/html vol_name=archive vols="[{vol-000000000000000 archive -1 1 /dev/xvdf /var/www/html ext4}]"
INFO[0014] Appending fstab entry vol_name=archive vols="[{vol-000000000000000 archive -1 1 /dev/xvdf /var/www/html ext4}]"
INFO[0014] Now mounting vol_name=archive vols="[{vol-000000000000000 archive -1 1 /dev/xvdf /var/www/html ext4}]"
FATA[0014] Couldn't mount: no such device vol_name=archive vols="[{vol-000000000000000 archive -1 1 /dev/xvdf /var/www/html ext4}]"
The issue with nvme volumes is that the attachment info shows as /dev/xvdb, but presents as /dev/nvme*n1.
The volume ID is embedded in the serial for the device identification, so that might be the easiest way of determining it, but would require having the utility installed:
[root@ip-10-11-16-200 ~]# nvme list -o json
{
"Devices" : [
{
"DevicePath" : "/dev/nvme0n1",
"Firmware" : "1.0",
"Index" : 0,
"ModelNumber" : "Amazon Elastic Block Store",
"ProductName" : "Non-Volatile memory controller: Amazon.com, Inc. Device 0x8061",
"SerialNumber" : "vol000000000000000",
"UsedBytes" : 0,
"MaximiumLBA" : 33554432,
"PhysicalSize" : 17179869184,
"SectorSize" : 512
},
]
}
[root@ip-10-11-16-200 ~]#
Actually, looking at the code, there is already some nvme logic using go-ebsnvme - which seems to try and find the correct nvme device using the attachment devicename.
Potentially better sticking with that, as it doesn't require an external package (nvme-cli).
Just had a quick look and have a feeling that the issue is potentially here:
https://github.com/sevagh/goat/blob/81f1aa344dc93f16d16f9682cd3aff04a0842992/ebs.go#L52-L53
Basically, the logic to determine the nvme device name is skipped over completely when the device already has a label, which will always be the case on the second mounting attempt
Nice sleuthing - it looks like I should put a call to this function, GetActualBlockDeviceName, somewhere: https://github.com/sevagh/goat/blob/b6af4b2b37cc6867d154aa19716e1667146b98cc/filesystem/drive.go#L38
Maybe when writing the fstab entry - but I'm having trouble discovering where the link between the label and nvme device name exists.
I'm building a 1.0.3-rc1
https://github.com/sevagh/goat/releases/tag/1.0.3-rc1
The gist is that I put the "GetActualBlockDeviceName" call early in the code to have the real nvme name in the volume struct rather than substituting it in the future.
What's funny is that I think all of this workaround comes from me choosing /dev/xvd{b,c,d,e,f,g,...} as a random device name for both non-NVME and NVME devices. So this "rediscovery" of /dev/nvme01 from /dev/xvdb is correcting a mistaken initial assumption in my code. But it's a bit entrenched and for now I want your bug to be solved.
Made some progress, it's now failing to mount, but using the correct device name in the log message:
FATA[0000] Couldn't mount: no such device vol_name=media vols="[{vol-000000000 media -1 1 /dev/nvme4n1 /var/www/html ext4}]"
/dev/nvme4n1 does exist, so presumably it's still trying to mount the old assumed /dev/xvd* behind the log message
If you run ls -l /dev/disk/by-label, perhaps the old LABEL still points to the bad name.
Nope, they all point to the correct nvme devices. Bear in mind the labels were created some time ago
I might have found a leftover place where I don't correct the AttachedName for nvme. Hold tight for rc2...
https://github.com/sevagh/goat/releases/tag/1.0.3-rc2
I'm cutting rcs frequently since it's a safe way for us to share trusted binaries (vs me emailing you a binary).
Yeah, no worries.
Same error, although interestingly, it has now managed to mount two volumes before bombing out.
It's only put one of the above volumes into /etc/fstab as well
Also, as a side note, it bombs out if something is already mounted, even though potentially it was Goat that mounted it in the first place (i.e. on a second run).
Feels like it should just break out of the loop for that paticular volume, rather than exiting completely.
Admitadly I guess it being run multiple times (as opposed to just at boot) wasn't an originally intended use-case
Idempotence is a good goal to have - it's strange for a tool to panic because of something it did a few minutes ago.
Unfortunately I don't use AWS (and haven't, since writing this tool) so I can't easily iterate over code to solve this issue.
I'll have a play locally and see if I can get it working, then submit a PR back
Submitted a PR for this. Not ideal, but it works.
FWIW, we're fairly heavy users of Goat, and an AWS consultancy. We'd be happy to take over maintenance if you're not using AWS at your current place.
Thanks for PR - re: taking over, that's a good idea. What's the best way to denote the change? Perhaps a note in the description that https://github.com/steamhaus/goat is official?

And I could archive this one.
Sure.
I'm pretty sure within the repo settings, there's a 'transfer' option. That then keeps a redirect in place from segagh/goat to steamhaus/goat.
I'd need to get rid of the fork first though
I've removed the fork, so feel free to do the transfer if you'd like
I'd prefer not to do a transfer. Would you be OK with continuing main development on your fork? Like I said, I'll point to it from this repo.