converged-edge-experience-kits icon indicating copy to clipboard operation
converged-edge-experience-kits copied to clipboard

Cannot program the FPGA

Open ravicorning opened this issue 3 years ago • 6 comments

Hi

Using this API to program the binary in the FPGA: kubectl rsu program -f <signed_RTL_image> -n -d <RSU_PCI_bus_function_id>

The documentation doesn't specify the arguments clearly, so assume signed_RTL_image is the name of the .bin file. Hostname, for some reason, the API doesn't seem to work with the node name k8S controller sees, have to use the IP address. The PCI bus id is the what i get doing "lspci |grep acc". Running this i see the, the fpga-opae.. container is in a pending state, complains about mismatch in the nodeselector, is there a corrsponding .yml i can modify to remove this filtering..?

[root@corningopenness opt]# kubectl describe pods fpga-opae-10.12.87.80-0b30-xbrx2 Name: fpga-opae-10.12.87.80-0b30-xbrx2 Namespace: default Priority: 0 Node: Labels: controller-uid=8a798e1c-5a1e-43d9-bb37-a9049458d61f job-name=fpga-opae-10.12.87.80-0b30 Annotations: Status: Pending IP:
IPs: Controlled By: Job/fpga-opae-10.12.87.80-0b30 Containers: fpga-opae: Image: fpga-opae-pacn3000:1.0 Port: Host Port: Command: sudo -E /bin/bash -c -- Args: ./check_if_modules_loaded.sh && fpgasupdate /root/images/20ww27.5-2x2x25G-5GLDPC-v1.6.1-3.0.0-unsigned.bin 0b30 && rsu bmcimg 0b30 Environment: PYTHONIOENCODING: utf-8 Mounts: /root/images from image-dir (rw) /sys/devices from class (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-fzdsz (ro) Conditions: Type Status PodScheduled False Volumes: class: Type: HostPath (bare host directory volume) Path: /sys/devices HostPathType:
image-dir: Type: HostPath (bare host directory volume) Path: /temp/vran_images HostPathType:
default-token-fzdsz: Type: Secret (a volume populated by a Secret) SecretName: default-token-fzdsz Optional: false QoS Class: BestEffort Node-Selectors: kubernetes.io/hostname=10.12.87.80 Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message


Warning FailedScheduling 58m default-scheduler 0/2 nodes are available: 2 node(s) didn't match node selector. Warning FailedScheduling 58m default-scheduler 0/2 nodes are available: 2 node(s) didn't match node selector

ravicorning avatar Mar 11 '21 00:03 ravicorning

The node selector values are not matching because you have run the program command with IP. Please try running it with the node hostname. This will he the same as the one in /etc/hostname of the node.

As for the kubectl rsu command, please execute kubectl rsu discover prior to the program command and copy the signed or unsigned image name from the output of that command and also the device ID. Consequently, paste the values in the kubectl rsu program command.

aniket-intel avatar Mar 11 '21 09:03 aniket-intel

Ok, sure, can you tell me which one is the device here 8086:0b30 or 54:00:0 [root@corningopenness ravi]# kubectl rsu discover -n 10.12.87.80

Available RTL images: [email protected]'s password:

Mar 10 44M 20ww27.5-2x2x25G-5GLDPC-v1.6.1-3.0.0-unsigned.bin

FPGA devices installed:

[email protected]'s password: 54:00.0 Processing accelerators [1200]: Intel Corporation Device [8086:0b30] Subsystem: Intel Corporation Device [8086:0000] Kernel driver in use: intel-fpga-pci Kernel modules: intel_fpga_pci

ravicorning avatar Mar 11 '21 17:03 ravicorning

Was able to download the image to FPGA, was able to configure the VFs in the FPGA, but cannot see it in available resources, any idea..what might be the problem ?, it says the resources should map to ConfigMap.yml for device plugin, but where is correlated with the bb_config helm chart provisioning ?

[root@corningopenness helm-charts]# kubectl get node opennesswkn-1 -o json | jq '.status.allocatable' { "cpu": "46", "devices.kubevirt.io/kvm": "110", "devices.kubevirt.io/tun": "110", "devices.kubevirt.io/vhost-net": "110", "ephemeral-storage": "96589578081", "hugepages-1Gi": "20Gi", "intel.com/intel_sriov_netdevice": "12", "memory": "110455600Ki", "pods": "110" }

[root@corningopenness helm-charts]# kubectl logs intel-fpga-cfg-opennesswkn-1-jwlpn ERROR: Section (FLR) or name (flr_time_out) is not valid. FEC FPGA RTL v3.0 UL.DL Weights = 3.3 UL.DL Load Balance = 128.128 Queue-PF/VF Mapping Table = READY Ring Descriptor Size = 256 bytes

--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+ | PF | VF0 | VF1 | VF2 | VF3 | VF4 | VF5 | VF6 | VF7 | --------+-----+-----+-----+-----+-----+-----+-----+-----+-----+ UL-Q'00 | | X | | | | | | | | UL-Q'01 | | X | | | | | | | | UL-Q'02 | | X | | | | | | | | UL-Q'03 | | X | | | | | | | | UL-Q'04 | | X | | | | | | | | UL-Q'05 | | X | | | | | | | | UL-Q'06 | | X | | | | | | | | UL-Q'07 | | X | | | | | | | | UL-Q'08 | | X | | | | | | | | UL-Q'09 | | X | | | | | | | | UL-Q'10 | | X | | | | | | | | UL-Q'11 | | X | | | | | | | | UL-Q'12 | | X | | | | | | | | UL-Q'13 | | X | | | | | | | | UL-Q'14 | | X | | | | | | | | UL-Q'15 | | X | | | | | | | | UL-Q'16 | | | X | | | | | | | UL-Q'17 | | | X | | | | | | | UL-Q'18 | | | X | | |

ravicorning avatar Mar 11 '21 19:03 ravicorning

Also posted this on github, mailing too expecting a faster response : Was able to download the image to FPGA, guess was able to configure the VFs in the FPGA, but cannot see it in available resources, any idea..what might be the problem ?, documentation says the resources should map to ConfigMap.yml for device plugin, but where is correlated with the bb_config helm chart provisioning ? @.*** helm-charts]# kubectl get node opennesswkn-1 -o json | jq '.status.allocatable' { "cpu": "46", "devices.kubevirt.io/kvm": "110", "devices.kubevirt.io/tun": "110", "devices.kubevirt.io/vhost-net": "110", "ephemeral-storage": "96589578081", "hugepages-1Gi": "20Gi", "intel.com/intel_sriov_netdevice": "12", "memory": "110455600Ki", "pods": "110" }

@.*** helm-charts]# kubectl logs intel-fpga-cfg-opennesswkn-1-jtg5f

ERROR: Section (FLR) or name (flr_time_out) is not valid.

FEC FPGA RTL v3.0

UL.DL Weights = 3.3

UL.DL Load Balance = 128.128

Queue-PF/VF Mapping Table = READY

Ring Descriptor Size = 256 bytes

--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+

    |  PF | VF0 | VF1 | VF2 | VF3 | VF4 | VF5 | VF6 | VF7 |

--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+

UL-Q'00 | | X | | | | | | | |

UL-Q'01 | | X | | | | | | | |

UL-Q'02 | | X | | | | | | | |

UL-Q'03 | | X | | | | | | | |

UL-Q'04 | | X | | | | | | | |

UL-Q'05 | | X | | | | | | | |

UL-Q'06 | | X | | | | | | | |

UL-Q'07 | | X | | | | | | | |

UL-Q'08 | | X | | | | | | | |

UL-Q'09 | | X | | | | | | | |

UL-Q'10 | | X | | | | | | | |

UL-Q'11 | | X | | | | | | | |

UL-Q'12 | | X | | | | | | | |

UL-Q'13 | | X | | | | | | | |

UL-Q'14 | | X | | | | | | | |

UL-Q'15 | | X | | | | | | | |

UL-Q'16 | | | X | | | | | | |

UL-Q'17 | | | X | | | | | | |

UL-Q'18 | | | X | | | | | | |

UL-Q'19 | | | X | | | | | | |

UL-Q'20 | | | X | | | | | | |

UL-Q'21 | | | X | | | | | | |

UL-Q'22 | | | X | | | | | | |

UL-Q'23 | | | X | | | | | | |

UL-Q'24 | | | X | | | | | | |

UL-Q'25 | | | X | | | | | | |

UL-Q'26 | | | X | | | | | | |

UL-Q'27 | | | X | | | | | | |

UL-Q'28 | | | X | | | | | | |

UL-Q'29 | | | X | | | | | | |

UL-Q'30 | | | X | | | | | | |

UL-Q'31 | | | X | | | | | | |

DL-Q'32 | | X | | | | | | | |

DL-Q'33 | | X | | | | | | | |

DL-Q'34 | | X | | | | | | | |

DL-Q'35 | | X | | | | | | | |

DL-Q'36 | | X | | | | | | | |

DL-Q'37 | | X | | | | | | | |

DL-Q'38 | | X | | | | | | | |

DL-Q'39 | | X | | | | | | | |

DL-Q'40 | | X | | | | | | | |

DL-Q'41 | | X | | | | | | | |

DL-Q'42 | | X | | | | | | | |

DL-Q'43 | | X | | | | | | | |

DL-Q'44 | | X | | | | | | | |

DL-Q'45 | | X | | | | | | | |

DL-Q'46 | | X | | | | | | | |

DL-Q'47 | | X | | | | | | | |

DL-Q'48 | | | X | | | | | | |

DL-Q'49 | | | X | | | | | | |

DL-Q'50 | | | X | | | | | | |

DL-Q'51 | | | X | | | | | | |

DL-Q'52 | | | X | | | | | | |

DL-Q'53 | | | X | | | | | | |

DL-Q'54 | | | X | | | | | | |

DL-Q'55 | | | X | | | | | | |

DL-Q'56 | | | X | | | | | | |

DL-Q'57 | | | X | | | | | | |

DL-Q'58 | | | X | | | | | | |

DL-Q'59 | | | X | | | | | | |

DL-Q'60 | | | X | | | | | | |

DL-Q'61 | | | X | | | | | | |

DL-Q'62 | | | X | | | | | | |

DL-Q'63 | | | X | | | | | | |

--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+

Mode of operation = VF-mode

FPGA_5GNR PF [0000:56:00.0] configuration complete!

From: aniket-intel @.> Reply-To: open-ness/openness-experience-kits @.> Date: Thursday, March 11, 2021 at 1:15 AM To: open-ness/openness-experience-kits @.> Cc: "Ravindran, Ravi (Ravishankar)" @.>, Author @.***> Subject: [EXTERNAL]--Re: [open-ness/openness-experience-kits] Cannot program the FPGA (#96)

Hi Ravi,

The node selector values are not matching because you have run the program command with IP. Please try running it with the node hostname. This will he the same as the one in /etc/hostname of the node.

As for the kubectl rsu command, please execute kubectl rsu discover prior to the program command and copy the signed or unsigned image name from the output of that command and also the device ID. Consequently, paste the values in the kubectl rsu program command.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/open-ness/openness-experience-kits/issues/96#issuecomment-796589458, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AS6LNA7D7VSDHPTPT4ZRBCTTDCCZDANCNFSM4Y7GJ2CQ.

ravicorning avatar Mar 11 '21 20:03 ravicorning

Run the kubectl rsu discover command with hostname.

Also, the device ID here is 54:00.0. Only once the FPGA card is configured properly, it will show in the list of allocable resources.

aniket-intel avatar Mar 12 '21 03:03 aniket-intel

I got this working after reinstalliing the worker node after programing the FPGA.

ravicorning avatar Mar 16 '21 00:03 ravicorning