goat panic: runtime error: index out of range

Hi @sevagh

First of all, really appreciate the work you've put into this tool - we use it a lot.

Recently though, I've randomly started encountering a problem that I've not seen before, specifically this error:

INFO[0000] Running goat for ebs
INFO[0000] WELCOME TO GOAT
INFO[0000] 1: COLLECTING EC2 INFO
INFO[0000] Establishing metadata client
INFO[0000] Retrieved metadata successfully               instance_id=i-0a403f852293bd8e3
INFO[0000] Using metadata to initialize EC2 SDK client   az=eu-west-1a instance_id=i-0a403f852293bd8e3 region=eu-west-1
INFO[0000] 2: COLLECTING EBS INFO
INFO[0000] Searching for EBS volumes
INFO[0000] Classifying EBS volumes based on tags
INFO[0000] 3: ATTACHING EBS VOLS
INFO[0000] 4: MOUNTING ATTACHED VOLS
panic: runtime error: index out of range

goroutine 1 [running]:
main.prepAndMountDrives(0xc0004388e0, 0x5, 0xab95d0, 0x0, 0x0)
	/home/shanssian/go/src/github.com/sevagh/goat/ebs.go:42 +0x1350
main.GoatEbs(0x7d1b00, 0x7cd503, 0x7)
	/home/shanssian/go/src/github.com/sevagh/goat/ebs.go:35 +0x256
main.main()
	/home/shanssian/go/src/github.com/sevagh/goat/main.go:66 +0x4c

The EBS volumes attach fine, however, they fail to mount. This looked the same as #47 initially, however, I've upgraded to 1.0.2 and the error persists.

The tagging all appears to be correct, GOAT-IN:FsType, GOAT-IN:MountPath, GOAT-IN:NodeId, GOAT-IN:Prefix, & GOAT-IN:VolumeName are all set, and I've verified the IAM permissionns are correct.

Any ideas?

Aug 06 '19 16:08 bilco105

I'm glad the tool has been useful.

It could be that the disks are somehow already attached, in which case my code ignores them - I believe that's a corner case that I hadn't thought through, I could proceed to the next steps (Mount, etc.) even if the disks are already attached.

How do you create EC2 machines + EBS vols? Terraform?

Do you have an example of a log from a successful run stored somewhere? I believe the emptiness between steps 2-3 and 3-4 could indicate the "disks already attached therefore my code doesn't add them to the array of disks to mount" hypothesis.

Aug 06 '19 16:08 sevagh

Thanks for the speedy response.

I've just downgraded to 0.6.0 which seems to have got things working again.

That's certainly quite possible. Goat is running at boot, so it may well have failed to mount because of some other reason, after having already attached all the volumes, then died.

I'm then running goat ebs a second time, which is where I'm seeing the error above.

I'll try upgrading back to 1.0.2 shortly and running it with no volumes attached to see what happens there. Sounds like I'm potentially running into two different issues

Aug 06 '19 16:08 bilco105

Ok, with everything detached, I get a different problem

FATA[0020] Drive /dev/xvdl doesn't exist after attaching - checked with stat 10 times  vol_id=vol-0c17eb2e0d145df15 vol_name=export

Interestingly, that volume took longer to attach than goat was willing to wait.

Aug 06 '19 16:08 bilco105

Can you test this hotfix? https://github.com/sevagh/goat/releases/tag/1.0.3-rc0

Aug 06 '19 18:08 sevagh

@sevagh That appears to have resolved the original issue 👍

I'm now getting this:

FATA[0000] Couldn't mount: no such device                vol_name=archive vols="[{vol-06ca06283f3290a4e archive -1 1 /dev/xvdb /var/www/html ext4}]"

We're using nvme, which is probably why - it should be /dev/nvme1n1

Aug 06 '19 19:08 bilco105

I should be getting that attached device name from the AWS API: https://docs.aws.amazon.com/sdk-for-go/api/service/ec2/#VolumeAttachment

Aug 06 '19 19:08 sevagh

@sevagh Do you want me to raise a new issue for the above? then you can close this one off

Aug 09 '19 09:08 bilco105

This same issue is OK. I'll get back to you soon.

Aug 09 '19 10:08 sevagh

Can you print the logs surrounding the above error message? There are some additional functions related to nvme (which another user helped contribute) - I'm not very well versed in Nitro and nvme devices.

Aug 09 '19 10:08 sevagh

Sure, here you go:

INFO[0000] Running goat for ebs
INFO[0000] WELCOME TO GOAT
INFO[0000] 1: COLLECTING EC2 INFO
INFO[0000] Establishing metadata client
INFO[0000] Retrieved metadata successfully               instance_id=i-000000000000
INFO[0000] Using metadata to initialize EC2 SDK client   az=eu-west-1a instance_id=i-000000000000 region=eu-west-1
INFO[0000] 2: COLLECTING EBS INFO
INFO[0000] Searching for EBS volumes
INFO[0000] Classifying EBS volumes based on tags
INFO[0000] 3: ATTACHING EBS VOLS
INFO[0000] Volume is unattached, picking drive name      vol_id=vol-000000000000000 vol_name=archive
INFO[0000] Executing AWS SDK attach command              vol_id=vol-000000000000000 vol_name=archive
INFO[0000] {
  AttachTime: 2019-08-11 10:52:23.463 +0000 UTC,
  Device: "/dev/xvdf",
  InstanceId: "i-000000000000",
  State: "attaching",
  VolumeId: "vol-000000000000000"
}  vol_id=vol-000000000000000 vol_name=archive
INFO[0014] 4: MOUNTING ATTACHED VOLS
INFO[0014] Label already exists, jumping to mount phase  vol_name=archive vols="[{vol-000000000000000 archive -1 1 /dev/xvdf /var/www/html ext4}]"
INFO[0014] Checking if something already mounted at %s/var/www/html  vol_name=archive vols="[{vol-000000000000000 archive -1 1 /dev/xvdf /var/www/html ext4}]"
INFO[0014] Appending fstab entry                         vol_name=archive vols="[{vol-000000000000000 archive -1 1 /dev/xvdf /var/www/html ext4}]"
INFO[0014] Now mounting                                  vol_name=archive vols="[{vol-000000000000000 archive -1 1 /dev/xvdf /var/www/html ext4}]"
FATA[0014] Couldn't mount: no such device                vol_name=archive vols="[{vol-000000000000000 archive -1 1 /dev/xvdf /var/www/html ext4}]"

The issue with nvme volumes is that the attachment info shows as /dev/xvdb, but presents as /dev/nvme*n1.

The volume ID is embedded in the serial for the device identification, so that might be the easiest way of determining it, but would require having the utility installed:

[root@ip-10-11-16-200 ~]# nvme list -o json
{
  "Devices" : [
    {
      "DevicePath" : "/dev/nvme0n1",
      "Firmware" : "1.0",
      "Index" : 0,
      "ModelNumber" : "Amazon Elastic Block Store",
      "ProductName" : "Non-Volatile memory controller: Amazon.com, Inc. Device 0x8061",
      "SerialNumber" : "vol000000000000000",
      "UsedBytes" : 0,
      "MaximiumLBA" : 33554432,
      "PhysicalSize" : 17179869184,
      "SectorSize" : 512
    },
  ]
}
[root@ip-10-11-16-200 ~]#

Aug 11 '19 10:08 bilco105

Actually, looking at the code, there is already some nvme logic using go-ebsnvme - which seems to try and find the correct nvme device using the attachment devicename.

Potentially better sticking with that, as it doesn't require an external package (nvme-cli).

Aug 11 '19 11:08 bilco105

Just had a quick look and have a feeling that the issue is potentially here:

https://github.com/sevagh/goat/blob/81f1aa344dc93f16d16f9682cd3aff04a0842992/ebs.go#L52-L53

Basically, the logic to determine the nvme device name is skipped over completely when the device already has a label, which will always be the case on the second mounting attempt

Aug 11 '19 11:08 bilco105

Nice sleuthing - it looks like I should put a call to this function, GetActualBlockDeviceName, somewhere: https://github.com/sevagh/goat/blob/b6af4b2b37cc6867d154aa19716e1667146b98cc/filesystem/drive.go#L38

Maybe when writing the fstab entry - but I'm having trouble discovering where the link between the label and nvme device name exists.

Aug 11 '19 12:08 sevagh

I'm building a 1.0.3-rc1

Aug 11 '19 12:08 sevagh

https://github.com/sevagh/goat/releases/tag/1.0.3-rc1

The gist is that I put the "GetActualBlockDeviceName" call early in the code to have the real nvme name in the volume struct rather than substituting it in the future.

Aug 11 '19 12:08 sevagh

What's funny is that I think all of this workaround comes from me choosing /dev/xvd{b,c,d,e,f,g,...} as a random device name for both non-NVME and NVME devices. So this "rediscovery" of /dev/nvme01 from /dev/xvdb is correcting a mistaken initial assumption in my code. But it's a bit entrenched and for now I want your bug to be solved.

Aug 11 '19 12:08 sevagh

Made some progress, it's now failing to mount, but using the correct device name in the log message:

FATA[0000] Couldn't mount: no such device                vol_name=media vols="[{vol-000000000 media -1 1 /dev/nvme4n1 /var/www/html ext4}]"

/dev/nvme4n1 does exist, so presumably it's still trying to mount the old assumed /dev/xvd* behind the log message

Aug 11 '19 12:08 bilco105

If you run ls -l /dev/disk/by-label, perhaps the old LABEL still points to the bad name.

Aug 11 '19 12:08 sevagh

Nope, they all point to the correct nvme devices. Bear in mind the labels were created some time ago

Aug 11 '19 12:08 bilco105

I might have found a leftover place where I don't correct the AttachedName for nvme. Hold tight for rc2...

Aug 11 '19 12:08 sevagh

https://github.com/sevagh/goat/releases/tag/1.0.3-rc2

I'm cutting rcs frequently since it's a safe way for us to share trusted binaries (vs me emailing you a binary).

Aug 11 '19 12:08 sevagh

Yeah, no worries.

Same error, although interestingly, it has now managed to mount two volumes before bombing out.

It's only put one of the above volumes into /etc/fstab as well

Aug 11 '19 12:08 bilco105

Also, as a side note, it bombs out if something is already mounted, even though potentially it was Goat that mounted it in the first place (i.e. on a second run).

Feels like it should just break out of the loop for that paticular volume, rather than exiting completely.

Admitadly I guess it being run multiple times (as opposed to just at boot) wasn't an originally intended use-case

Aug 11 '19 13:08 bilco105

Idempotence is a good goal to have - it's strange for a tool to panic because of something it did a few minutes ago.

Unfortunately I don't use AWS (and haven't, since writing this tool) so I can't easily iterate over code to solve this issue.

Aug 11 '19 15:08 sevagh

I'll have a play locally and see if I can get it working, then submit a PR back

Aug 11 '19 17:08 bilco105

Submitted a PR for this. Not ideal, but it works.

FWIW, we're fairly heavy users of Goat, and an AWS consultancy. We'd be happy to take over maintenance if you're not using AWS at your current place.

Aug 12 '19 11:08 bilco105

Thanks for PR - re: taking over, that's a good idea. What's the best way to denote the change? Perhaps a note in the description that https://github.com/steamhaus/goat is official?

descr

And I could archive this one.

Aug 12 '19 11:08 sevagh

Sure.

I'm pretty sure within the repo settings, there's a 'transfer' option. That then keeps a redirect in place from segagh/goat to steamhaus/goat.

I'd need to get rid of the fork first though

Aug 12 '19 12:08 bilco105

I've removed the fork, so feel free to do the transfer if you'd like

Aug 12 '19 15:08 bilco105

I'd prefer not to do a transfer. Would you be OK with continuing main development on your fork? Like I said, I'll point to it from this repo.

Aug 12 '19 16:08 sevagh

goat goat copied to clipboard

panic: runtime error: index out of range

goat
goat copied to clipboard