GOAD icon indicating copy to clipboard operation
GOAD copied to clipboard

Warning and Errors when performing Proxmox provisioning

Open markjhunsinger opened this issue 1 year ago • 27 comments

Hello!

I've been following the blog post on installing GOAD on Proxmox and have run into a few hiccups along the way, but can't quite figure out this last one. I'm on Part 4 of the walkthrough - Run the playbook.

I'm seeing a warning that there is an error when collecting bios/platform/processor facts related to "Failed to get SMBIOS buffer information (Incorrect function...)". I've included a screenshot below.

gathering-facts

The script continues, and then I'm seeing a bunch of fatal errors seemingly related to NuGet. Another screenshot below.

task-common

play-recap

I've already tried removing the VMs in Proxmox and rebuilding them using Terraform, but I get the same result. I've also tried different versions of Ansible with no luck.

Any suggestions are appreciated!

markjhunsinger avatar Jan 12 '24 21:01 markjhunsinger

This error is because you didn't have internet on the vms. verify pfsense can resolve dns and verify pfsense allow the goad vlan to contact internet.

Mayfly277 avatar Jan 12 '24 23:01 Mayfly277

It seems like pfsense can resolve DNS and the firewall rules for VLAN10 are set correctly.

DNS Lookup: dns-lookup

Firewall rules for VLAN10: fw-vlan10

But you are right - the VMs do not have internet access, and I'm not quite sure why. Both the WAN and LAN devices (Proxmox, pfsense, and provisioning) all have internet.

Any insight into my firewall rule or anything else I can check?

markjhunsinger avatar Jan 13 '24 18:01 markjhunsinger

After a few Proxmox reboots (or any number of changes I've made in pfsense in the last couple of days, who can say), the VMs now have internet access.

I still got the Gathering Facts warnings, but the script seems to be progressing fine so far despite them.

I will post an update shortly!

markjhunsinger avatar Jan 14 '24 20:01 markjhunsinger

After a few Proxmox reboots (or any number of changes I've made in pfsense in the last couple of days, who can say), the VMs now have internet access.

I still got the Gathering Facts warnings, but the script seems to be progressing fine so far despite them.

I will post an update shortly!

Any update on this? Facing the exact same issue.

GabrielKrueger avatar Jan 16 '24 12:01 GabrielKrueger

After a few Proxmox reboots (or any number of changes I've made in pfsense in the last couple of days, who can say), the VMs now have internet access. I still got the Gathering Facts warnings, but the script seems to be progressing fine so far despite them. I will post an update shortly!

Any update on this? Facing the exact same issue.

Still having some issues with VMs connecting to the internet. Destroyed and rebuilt some of the VMs with terraform, and they will connect to the internet for a short while, but then they'll stop working again out of nowhere.

I got to a point where all VMs were connected to the internet, but as soon as I ran the Ansible provisioning script, the connection died again. I'm going to be messing with it more today to see if I can figure out what's going on.

markjhunsinger avatar Jan 16 '24 17:01 markjhunsinger

I switched all the GOAD VMs back to virtIO network devices, and they are all connected now. I am rerunning the provisioning script at the moment and provide an update with the results.

markjhunsinger avatar Jan 16 '24 17:01 markjhunsinger

Provisioning does not work for me when the interfaces are VirtIO, so I switched them back to Intel.

Still having issues with DC02 getting an internet connection. I believe it has to do with DHCP, since the other four servers have DHCP leases in pfSense, but DC02 does not for some reason.

markjhunsinger avatar Jan 16 '24 19:01 markjhunsinger

The problem seems to be that the machines inside the VLAN10 (192.168.10.X) got the gateway set as 192.168.10.1 but it cannot be reached (no ping possible)?

GabrielKrueger avatar Jan 18 '24 07:01 GabrielKrueger

After it all, I still wasn't able to get one of the VMs to connect to the internet, so I ended up starting fresh and installed Proxmox 7.4 in hopes to avoid the Terraform provider issues (so far so good).

My issue once again is no internet access on any of the VMs. pfSense can resolve DNS and the firewall rules are in place to allow internet to the VLANs, so I'm not quite sure what the issue is. Last time it seemingly ended up correcting itself, but no luck so far.

Going to continue poking at things and see what sticks.

markjhunsinger avatar Jan 25 '24 20:01 markjhunsinger

I've been messing with this and believe I have discovered the issue with internet access on the VMs.

According to the guide, this is what the VLAN10 firewall should look like:

image

The INTERNAL alias we set up includes the following Networks:

192.168.1.1/16 (LAN + VLAN) 10.0.0.1/30 (WAN) 10.10.10.0/24 (VPN)

Although the rule does technically allow communication with the internet, there is no explicit rule allowing communication to the gateway, Please correct me if I'm wrong, but it looks like the VLAN10 firewall needs some additional rules.

If I set it up like this, the hosts can connect to the internet.

image

markjhunsinger avatar Feb 05 '24 18:02 markjhunsinger

I'm having the same problem, I've tried everything, but I can't solve it.

navees1 avatar Feb 19 '24 03:02 navees1

I think i've found the problem at least, I have it working on my proxmox. What I found was that in the packer config the network card is a "virto" NIC, so this gets in the template. When you deploy the default terraform recipe the NIC is a "e1000" card. so what happen is (at least om my machine) is that the NIC gets renamed to "Ethernet 2". So you might think, whatever... well the ansible playbook is looking to configure the NIC "Ethernet", but as mention prio that one is not connected so this causes ansible to fail.

What I recommend is, log in one of your systems and check what the name is of the NIC (username: varagnt, PW: vagrant) BTW don't forget too set the keyboard settings to eng because, the default is Fr. Thus, to verify that you have this issue, connect the provisioning CT to the 192.168.10.0/24 network, and ping the default gateway.

The way I solved this is, to change the NICs in the terrafrom recipe to "virto" and redeploy the VMs. This is in my opinon the quickest solution.

chuckjorrit avatar Feb 19 '24 06:02 chuckjorrit

Setting them to virtio certainly gives you Internet, but according to the blog post, you will have issues with the machines connecting to the domain when running the playbook. I had this issue myself so I ended up using the e1000 card as suggested. If it works for you, that's great!

My main issue with the internet was that the VLAN VMs were unable to communicate with the gateway (logging into the Vagrant account confirmed this was the case). I still can't quite put my finger on what fixed it for me, but messing with the VLAN firewall rules seemed to do the trick.

I was able to finish everything else up successfully and have a working GOAD now.

markjhunsinger avatar Feb 19 '24 12:02 markjhunsinger

Can you send me a printout with the pfsense rules for your vlan?

navees1 avatar Feb 19 '24 12:02 navees1

image

I created these rules, and I managed to solve the problem

navees1 avatar Feb 19 '24 17:02 navees1

I suggest Setting the protocol to any. that should solve it I think. Or make a rule for all the protocols needed. But I suggest the first option.

chuckjorrit avatar Feb 19 '24 18:02 chuckjorrit

Here are my current VLAN rules:

image

But as @chuckjorrit mentioned above, you should add a rule at the top to allow all traffic until your VMs can connect to the gateway/Internet. Then you should be able to run the provisioning script without any issue (hopefully). After everything was connecting for me, I removed the "allow any" rule, and it still seems to work fine for me. Don't ask me why.

markjhunsinger avatar Feb 19 '24 18:02 markjhunsinger

Hi @markjhunsinger I tried your method to add any to any rule in VLAN10 and now i can ping 8.8.8.8(has internet). But still can't ping to google.com and getting error when Upgrade module PowerShellGet to fix accept license issue on last windows ansible version. Do you have any idea why this can't resolv domain?

Update: Change dns_server=192.168.10.1 to dns_server=8.8.8.8 in ../ad/GOAD/providers/proxmox/inventorysolve the problem. Is it safe way to do it?

aancw avatar Mar 20 '24 05:03 aancw

I also get a similar behaviour when the VMs are provisionned. In my goad.tf file, the VMs are configured with dns = "192.168.10.1" In ../ad/GOAD/providers/proxmox/inventory my I also have : dns_server=192.168.10.1

But when I look at the network configuration, which, as pointed out by @chuckjorrit , shows an interface name of "Ethernet 2", there is no DNS server present. image

Once I manually add the dns server to 192.168.10.1, I have Internet on the host.

EDIT: this might be my misunderstanding, as it might be ansible's job to enforce the DNS in part 4, as opposed to ie being configured in part 3 as part of the provisionning.

bdesforges avatar Apr 04 '24 21:04 bdesforges

Hi @markjhunsinger I tried your method to add any to any rule in VLAN10 and now i can ping 8.8.8.8(has internet). But still can't ping to google.com and getting error when Upgrade module PowerShellGet to fix accept license issue on last windows ansible version. Do you have any idea why this can't resolv domain?

Update: Change dns_server=192.168.10.1 to dns_server=8.8.8.8 in ../ad/GOAD/providers/proxmox/inventorysolve the problem. Is it safe way to do it?

Did you change this DNS after?

BerSecHub avatar Apr 20 '24 04:04 BerSecHub

Hi @markjhunsinger I tried your method to add any to any rule in VLAN10 and now i can ping 8.8.8.8(has internet). But still can't ping to google.com and getting error when Upgrade module PowerShellGet to fix accept license issue on last windows ansible version. Do you have any idea why this can't resolv domain? Update: Change dns_server=192.168.10.1 to dns_server=8.8.8.8 in ../ad/GOAD/providers/proxmox/inventorysolve the problem. Is it safe way to do it?

Did you change this DNS after?

Yes I change the dns_server to 8.8.8.8

aancw avatar Apr 20 '24 08:04 aancw

I had the same problem and it was also related to DNS resolution, there is a step missed in the setup guide when configuring PFSense

When doing an nslookup from any host to pfsense I was getting "Query Refused" so I configured an access list on the DNS Resolver in pfsense and it fixed it.

Picture below is how I configured it, just made an allow rule for the 192.168.0.0/16 subnet

Screenshot 2024-04-22 at 6 19 13 PM

hope this helps others, I was having the same issue as OP.

e-fin avatar Apr 22 '24 22:04 e-fin

I had the same problem and it was also related to DNS resolution, there is a step missed in the setup guide when configuring PFSense

When doing an nslookup from any host to pfsense I was getting "Query Refused" so I configured an access list on the DNS Resolver in pfsense and it fixed it.

Picture below is how I configured it, just made an allow rule for the 192.168.0.0/16 subnet

Screenshot 2024-04-22 at 6 19 13 PM

hope this helps others, I was having the same issue as OP.

I will try to change the dns_server in DC-X and SRV to 192.168.10.1 again and try your solution. Will update if i can ping the internet

UPDATE: I can't connect to internet when disable the any to any rule for VLAN10 and enable Access List for DNS Resolver

Pfsense disable any to any VLAN10 rule Pfsense enable dns resolve access list ping from machine

aancw avatar Apr 23 '24 00:04 aancw

I had the same problem and it was also related to DNS resolution, there is a step missed in the setup guide when configuring PFSense When doing an nslookup from any host to pfsense I was getting "Query Refused" so I configured an access list on the DNS Resolver in pfsense and it fixed it. Picture below is how I configured it, just made an allow rule for the 192.168.0.0/16 subnet Screenshot 2024-04-22 at 6 19 13 PM hope this helps others, I was having the same issue as OP.

I will try to change the dns_server in DC-X and SRV to 192.168.10.1 again and try your solution. Will update if i can ping the internet

UPDATE: I can't connect to internet when disable the any to any rule for VLAN10 and enable Access List for DNS Resolver

Pfsense disable any to any VLAN10 rule Pfsense enable dns resolve access list ping from machine

Try adding a firewall rule that allows "ANY" protocol to allow ICMP, looks like your rules only allow TCP and UDP. i wasnt able to ping until i added an ICMP rule as well.

Also looks like you need a firewall rule to allow traffic to the internet, that ANY ANY rule seems to be acting as that rule.

It looks like google was able to resolve so your DNS issue appears to be resolved.

e-fin avatar Apr 23 '24 01:04 e-fin

Try adding a firewall rule that allows "ANY" protocol to allow ICMP, looks like your rules only allow TCP and UDP. i wasnt able to ping until i added an ICMP rule as well.

Will try that suggestion.

Also looks like you need a firewall rule to allow traffic to the internet, that ANY ANY rule seems to be acting as that rule.

Is it necessary to add ANY ANY in VLAN10? I think ANY ANY is already allowed in WAN rules for internet traffic.

It looks like google was able to resolve so your DNS issue appears to be resolved.

You're right, the dns resolved.

aancw avatar Apr 25 '24 02:04 aancw

Similar Issue. Testing with pfctl -d so I don't think it's a firewall issue. I also had issues with the provisioning VM getting internet I solved that by adding an IPTables rule: iptables -t nat -A POSTROUTING -o vmbr0 -s 192.168.1.0/24 -j MASQUERADE but not really sure what to do here...

Tracert shows it's stuck on the first hop (192.168.10.1)

image

h4ckd0tm3 avatar May 03 '24 14:05 h4ckd0tm3