docker-machine-driver-proxmox-ve icon indicating copy to clipboard operation
docker-machine-driver-proxmox-ve copied to clipboard

Rancher: Error with pre-create check: "unexpected end of JSON input"

Open jmondragon opened this issue 4 years ago • 15 comments

When trying the v2 binary from within Rancher, I get the following errors:

Error with pre-create check: "unexpected end of JSON input"

and

Timeout waiting for ssh key

While trying to provision nodes for the cluster. I'm not sure what troubleshooting steps to perform next. I will try to provision via cli as outlined in the Readme.

jmondragon avatar Dec 29 '19 15:12 jmondragon

Hi,

can you try the latest prerelease version? It has improved error reporting and was tested with Rancher OS extensively and may indicate what the problem is.

lnxbil avatar Dec 29 '19 15:12 lnxbil

I did end up trying the latest, and it gave more information. It gave me the following error:

Flag provided but not defined: -proxmoxve-storage-type;

Seemingly randomly choosing the flag (e.g. -proxmoxve-user, -proxmoxve-storage, etc.)

I went back to v2 and changed my HOST from the IP address to a hostname, and it seemed to work better. I wonder if it has something to do with the . in IP addresses or FQDN.

I still ended up having trouble connecting via SSH, but the VM did get created. I will continue to troubleshoot.

Thanks, Josh

jmondragon avatar Dec 29 '19 16:12 jmondragon

Hello there,

We too have encountered the same problem when using v2, will try to provide a hostname instead of an IP as well.

Also, we are experiencing the other problem with v3, but I recon that should probably go in another issue, though logs indicate just that the mentioned flag is empty whereas others aren't (and the host doesn't change if I put an IP, it takes the default 192... Would have to check with a hostname as well.

Thank you for your work!

CarlosLVar avatar Jan 02 '20 20:01 CarlosLVar

Testing the v3pre3 and every run of trying provision a node gives me a different error like:

Flag provided but not defined: -proxmoxve-disksize-gb; Timeout waiting for ssh key

Each run, or even each node trying to be deployed, will give an error like this. Sometimes it is -proxmoxve-disksize-gb or -proxmoxve-guest-ssh-port or -proxmoxve-storage. Seems to just pick a random defined entry to fail on.

Really looking forward to getting this working.

Edit: I originally tried the v2 and also had issues. Running Rancher v2.3.3 and Proxmox 6.1-5

ropeguru avatar Jan 07 '20 14:01 ropeguru

Adding some additional info..

It appears that when Rancher reads the template to gather defined values, something is going wrong.

For instance, I have "NFS-Datastore" defined for the storage location. In the log below, it is showing as "local". Also I have the disk size configured for 50GB, but the entry below is 16Gb. And the list goes on.

Debugguing Rancher container, I find the following in the log: (passwords have been changed)

2020/01/07 18:35:04 [DEBUG] create cmd [create -d proxmoxve --engine-install-url https://releases.rancher.com/install-docker/19.03.sh --proxmoxve-user root --proxmoxve-driver-debug --proxmoxve-image-file NFS-Datastore:iso/rancheros.iso --proxmoxve-password password --proxmoxve-disksize-gb 50 --proxmoxve-guest-ssh-port 22 --proxmoxve-memory-gb 8 --proxmoxve-storage-type qcow2 --proxmoxve-guest-username docker --proxmoxve-host 192.168.1.171 --proxmoxve-realm pam --proxmoxve-storage NFS-Datastore --proxmoxve-guest-password 123456] 2020/01/07 18:35:04 [INFO] Provisioning node ranchnode1 2020/01/07 18:35:04 [DEBUG] stdout: Incorrect Usage. 2020/01/07 18:35:04 [INFO] [node-controller-docker-machine] Incorrect Usage.

So something in the create cmd under Rancher is not correct. The values passed in the first entry are what I have in my template.

Edited for more constructive info.

Another edit: I found the issue by manually running the create command in my Rancher container.

It seems that the v3-pre3 is still using the option form of --proxmox-

If I manually run a machine-create in the Rancher container using the v3-pre3 driver, I still get the json error, but it does connect and pull the next valid ID.

Hope this helps.

ropeguru avatar Jan 07 '20 18:01 ropeguru

New version with a lot of fixes and merged PR. Please try again and report back.

lnxbil avatar Jan 23 '20 11:01 lnxbil

Thank you, I'll give it a try. I don't see the docker-machine-driver-proxmoxve.linux-amd64 (on the v3 release page), only zip and tar.gz files. Can I use the link to the tar.gz within Rancher?

jmondragon avatar Jan 23 '20 15:01 jmondragon

Thank you, I'll give it a try. I don't see the docker-machine-driver-proxmoxve.linux-amd64 (on the v3 release page), only zip and tar.gz files. Can I use the link to the tar.gz within Rancher?

Ah sorry, I clicked the "published" button too soon. The binaries weren't completely uploaded yet. I reuploaded them and now they should be present.

lnxbil avatar Jan 23 '20 17:01 lnxbil

This worked great! I did have to use the latest 1.5.5 rancheros-proxmoxve-autoformat.iso as outlined in the README.

jmondragon avatar Jan 23 '20 19:01 jmondragon

I am using V3 and am getting this error.

I currently have a RancherOS running in one VM on a 3-node proxmox cluster, inside RancherOS docker has running Rancher that has this driver load. When I try to create a new cluster with a node, I get this same exact issue.

cyrus104 avatar Apr 25 '20 14:04 cyrus104

I'm running into this issue as well on V3 using both the latest (1.5.6) and 1.5.5 image. I have no idea where to begin debugging this or which information to provide.

I've tried this from a windows 10 laptop (inside ubuntu wsl1 20.04) and regular ubuntu 20.04 both with the same result (on the same pve node)

edit:

Just noticed that go-resty is returning a "596 tls_process_server_certificate: certificate verify failed" after which proxmoxve driver is returning the unexpected end of JSON input"

This is weird since I set up my CA correctly for my hostname (using Let's encrypt). It returns the same error when using the IP address instead of hostname.

mjkl-gh avatar Jun 07 '20 15:06 mjkl-gh

Can you please test with v4? I already could not replicate the issue with v3.

lnxbil avatar Jul 29 '20 21:07 lnxbil

Aiaiai. I must unfortunately admit I remember fixing this issue. If I'm working from memory it was something stupid as using the wrong pve node or misspelling a hostname. I will try to get back to you on this!

mjkl-gh avatar Jul 30 '20 06:07 mjkl-gh

Same problem with v4.

I have test deploy with docker-machine manually with same parameters, and works! But from rancher UI failed with Timeout waiting for ssh key and no other information to help debug :cry:

cedvan avatar Oct 22 '21 13:10 cedvan

Hum ok, problem with dns resolution, use IP in host and works. It's sad but work...

Be careful, during my tests, the creation of VMs failed but yet I had residual vm-disks, be sure to check and delete them manually if necessary. Otherwise you will have 500 during the next creations because vm-disk already exist

cedvan avatar Oct 22 '21 16:10 cedvan