packngo icon indicating copy to clipboard operation
packngo copied to clipboard

Device API resources reported with wrong network_type [caching?]

Open t0mk opened this issue 5 years ago • 16 comments

Maybe network_type caching again?

I create a device in ewr1. It provisions and gets to active state. Then, sometimes, it's network_state is reported as layer2-bonded over the API and layer3 in the GUI. for example device d60113d4-41e1-478f-8807-dcc1a9d328ba

curl --silent -X GET -H "X-Auth-Token: $PACKET_AUTH_TOKEN" "https://api.packet.net/devices/d60113d4-41e1-478f-8807-dcc1a9d328ba/" | jq

.. over the API it's layer2_bonded and in the GUI it's layer3

t0mk avatar Apr 05 '19 15:04 t0mk

@dkenzox @batmany13 Maybe you can help with this one? We already had issue with network_type being reported wrong in #134, maybe this is caused by similar culprit?

t0mk avatar Apr 05 '19 15:04 t0mk

this will cause some issues for Terraform users

t0mk avatar Apr 05 '19 15:04 t0mk

whoops, I removed the device which I linked as an example, but I have a similar problem debugging L2 TF for @goldenprifti with this device: curl --silent -X GET -H "X-Auth-Token: $PACKET_AUTH_TOKEN" "https://api.packet.net/devices/d60113d4-41e1-478f-8807-dcc1a9d328ba/" | jq reported as "layer2-bonded" over the curl, but in the API it's as "layer3"

t0mk avatar Apr 08 '19 10:04 t0mk

.. and when I add ?exclude=project_lite to the URL, the proper network_type is reported.

t0mk avatar Apr 08 '19 10:04 t0mk

@t0mk We will be adding this to our priority items. Thank you!

amenowanna avatar Apr 08 '19 13:04 amenowanna

This just hit a Terrraform user: https://github.com/terraform-providers/terraform-provider-packet/issues/138

t0mk avatar Apr 09 '19 19:04 t0mk

Hitting me too!

nathangoulding avatar Apr 24 '19 21:04 nathangoulding

Same here, run into this issue intermittently.

savithrihv avatar Apr 25 '19 16:04 savithrihv

@t0mk We have deployed an update to address this issue. Can you please validate that things are working for you now. Thank you!

amenowanna avatar Apr 26 '19 21:04 amenowanna

@amenowanna I will take a look today.

t0mk avatar Apr 30 '19 09:04 t0mk

I just ran a m1.xlarge.x86 through all the network_state transitions, and it seems OK. I am closing this.

t0mk avatar Apr 30 '19 09:04 t0mk

This happened again, it generated some support traffic, @enkelprifti98 might know more details.

In very short, getting same device with different GET params from the API will return different network_type.

tomk@xps ~ » curl -s -H "X-Auth-Token: $PACKET_AUTH_TOKEN" "https://api.packet.net/devices/cab17063-6ea3-4848-9084-d07f1bf66ef7?exclude=project_lite" | jq ".network_ports[0].network_type"
"layer3"

VS

tomk@xps ~ » curl -s -H "X-Auth-Token: $PACKET_AUTH_TOKEN" "https://api.packet.net/devices/cab17063-6ea3-4848-9084-d07f1bf66ef7" | jq ".network_ports[0].network_type"   
"layer2-bonded"

t0mk avatar Feb 28 '20 13:02 t0mk

We discovered another scenario where this could occur with the cached data for a network bond port not updating when the network mode changed (for example, from layer2-bonded to layer3). We've implemented a fix to ensure the cache data updates for this event.

jordan0day avatar Mar 04 '20 15:03 jordan0day

I would like to call this issue closed based on the work that has gone into fixing the network_type cache bug.

Based on a recent flare-up of this bug, and without being aware of this bug, I started looking for alternative solutions. In #185 I have captured the portal UI logic which corrects some behaviors of the GetNetworkType() function in packngo, based on the reported network_type of the bond devices.

With both of these efforts, I would like to think the problems described here are no longer the case and we can close this issue. But, there is one state that I witnessed that seemed irregular.

@jordan0day, does it make sense that t1.small.x86 would return {"name":"bond0","network-type":"later2-bonded"} in the POST response (on creation), and only later will it return "network-type": "layer3"? Is there a meaningful transition performed here or should the initial API response include "layer3"?

displague avatar Jul 23 '20 05:07 displague

@displague There is in fact a transition that occurs during the device creation process. On a typical provision, the device is created with its network interfaces bonded, and we go through some background work of assigning IP addresses to the device (the "Management IPs" you see on the device overview page in the UI). For that period of time between when the device starts spinning up and the "Management IPs" are assigned to the bond, the "network type" will be "layer2-bonded". Once the "Management IPs" have been assigned to the bond, however, the network type will change to "layer3".

jordan0day avatar Jul 23 '20 13:07 jordan0day

Thanks, @jordan0day. With that clarification, I'll hold on closing this until https://github.com/packethost/packngo/pull/185 is merged.

I must say, creating devices with a create-time network_type definition (a combination of ports) would be beneficial in the use-cases that I have seen. Maybe this can't be done at the port level because the port count is dependent on availability? Maybe the API could accept network_type at creation since it has a well-known meaning regardless of the port count.

displague avatar Jul 23 '20 14:07 displague