FPGA_as_a_Service
FPGA_as_a_Service copied to clipboard
FPGA
When running more that one job inside a pod cannot submit more than one job reliably. If more that one job is summitted in succession we get a input output error. This problem can be mitigated by xbutil reset from the host before a pod is spun up but this is not a desirable .
Any feedback would be grateful.
user@mlcluster-interactive-example-jfdz2:~/FPGA_test$ ./host vadd_hw.xclbin 512 0 1 64
Total Data of 512.000 Mbytes to be written to global memory from host
Kernel is invoked 1 time and repeats itself 1 times
Found Platform
Platform Name: Xilinx
DEVICE xilinx_u55c_gen3x16_xdma_base_3
INFO: Reading vadd_hw.xclbin
Loading: 'vadd_hw.xclbin'
- host loop iteration #0 of 1 total iterations
kernel_time_in_sec = 0.0421578
Duration using events profiling: 42050286 ns
match_count = 134217728 mismatch_count = 0 total_data_size = 134217728
Throughput Achieved = 12.7674 GB/s
TEST PASSED
user@mlcluster-interactive-example-jfdz2:~/FPGA_test$ ./host vadd_hw.xclbin 512 0 1 64
Total Data of 512.000 Mbytes to be written to global memory from host
Kernel is invoked 1 time and repeats itself 1 times
Found Platform
Platform Name: Xilinx
DEVICE xilinx_u55c_gen3x16_xdma_base_3
INFO: Reading vadd_hw.xclbin
Loading: 'vadd_hw.xclbin'
- host loop iteration #0 of 1 total iterations
XRT build version: 2.14.384
Build hash: 090bb050d570d2b668477c3bd0f979dc3a34b9db
Build date: 2022-12-09 00:55:08
Git branch: 2022.2
PID: 99
UID: 1006
[Mon Apr 8 15:10:45 2024 GMT]
HOST: mlcluster-interactive-example-jfdz2
EXE: /home/gregj/FPGA_test/host
[XRT] ERROR: unable to sync BO: Input/output error
terminate called after throwing an instance of 'xrt_xocl::error'
what(): event 0 never submitted
Aborted (core dumped)
Hi @iavssw, this issue may related to the XRT container solution, could try to run this test under a pure container environment without k8s and see what happens? If it can be reproduced, I suggest to reach the XRT team for further help.