raspberry-pi-pcie-devices icon indicating copy to clipboard operation
raspberry-pi-pcie-devices copied to clipboard

Hailo-8 M.2 AI Acceleration Module

Open rezasaadat1 opened this issue 3 years ago • 46 comments

There is another AI Accelerator out there, please check the compatibility in raspberry pi pcie devices https://hailo.ai/product-hailo/hailo-8-m2-module/

M 2-all-modules-for-website

rezasaadat1 avatar Jul 02 '21 08:07 rezasaadat1

@RezaSaadat1 - Do you know if there are any Linux drivers for this board? Any guides for it? I didn't do too much digging, but this is the first I've heard of it.

geerlingguy avatar Jul 02 '21 16:07 geerlingguy

So I've been looking this up since seeing this and it seems most of the information is behind a 'business enquiry' wall.

I did find a link to a data sheet from March 2021 that seems to suggest that the Linux driver is pre-complied and only for 4.15 and 5.0 kernels which makes me think its proprietary. It does list it as being compatible with ARM64 though.

Data Sheet Link

More11o avatar Dec 05 '21 16:12 More11o

I've sent in an email to Hailo, to see about getting one to test. Right now it seems they mostly sell to integrators directly, it's not as much a direct-to-consumer product available in small quantities. But I'm asking about that and future plans, since supposedly it should work with the CM4.

geerlingguy avatar Dec 05 '21 17:12 geerlingguy

I can see that the product is available from up-board.org as retail units and they claim that it works with their boards. But again those are intel atom boards that can feed it. I doubt Pi will even be able to feed the data to it considering only 1 lane pcie that too is shared with usb3 devices I think.

pritamghanghas avatar Sep 28 '23 14:09 pritamghanghas

I may be able to get a device to test! Finally got in touch, we'll see.

geerlingguy avatar Sep 28 '23 16:09 geerlingguy

@geerlingguy Any update on this topic? I am also very interested in the compatibility of the Pi5 with those Hailo-8 modules. It is hard to find any references on the internet.

cyclux avatar Jan 31 '24 01:01 cyclux

I have one in hand... have not been able to do my testing on it yet...

geerlingguy avatar Jan 31 '24 04:01 geerlingguy

Nice! Looking forward to your test results 👍

cyclux avatar Jan 31 '24 14:01 cyclux

This is on the site now, at least: https://pipci.jeffgeerling.com/cards_m2/hailo-8-ai-module.html

And one other report from a user in the wild: https://twitter.com/jonlech/status/1754921750083596664

geerlingguy avatar Feb 06 '24 19:02 geerlingguy

Funny, got the notification while watching a video of yours :)

I am very surprised there is no test report from anybody yet, although those Hailo chips have been out there since 2021. Maybe that is because of the relatively late adoption of M.2 with SBCs..

Soon I'll get my "Pineberry HAT AI". If there is success in running the M.2 Hailo, will try to order one as well using your link on https://pipci.jeffgeerling.com. BTW: Are you sure it's a working referral link? Cannot see any URL params, or will they recognize it via HTTP referer?

cyclux avatar Feb 06 '24 19:02 cyclux

@cyclux - Not all of the links are referral links, usually only ones to Amazon product listings. I get a small amount of revenue from referral links but I mostly want to just share the neat things I find, referral or not!

geerlingguy avatar Feb 06 '24 20:02 geerlingguy

Ah, I see, so If you'd like to purchase this card, it helps me out if you use the following product link: is a template sentence, and the link is not guaranteed to be an actual referral. Too bad, just wanted to support you, will find another way, sooner or later!

PS: Are you aware of any other "affordable" (<$300) M.2 NPUs / TPUs / AI Accelerators that are known to be compatible with Pi / Pi OS?

cyclux avatar Feb 06 '24 20:02 cyclux

@cyclux - It looks like Arducam has their PiNSIGHT setup: https://www.arducam.com/arducam-introduces-pinsight-your-vision-ai-mate-for-raspberry-pi-5/ (I have one to test, haven't gotten to test it yet) based on Intel Movidius Myriad X (4 TOPS). There's also Sony's AITRIOS lineup, but I haven't heard anything about that recently.

geerlingguy avatar Feb 06 '24 21:02 geerlingguy

Thanks. I also stumbled upon Myriad X, but have not found a dedicated M.2 module yet (with reported support for Pi). Just some mysterious ones like https://www.aaeon.com/en/p/ai-edge-computing-board-ai-core-xm-2280 . Hopefully, with the Pi M.2 HAT releases, AI modules will be more common this year.

cyclux avatar Feb 06 '24 22:02 cyclux

The Hailo-8 M.2 works fine with the Pi5 (and pciex1_gen=3) using the Pineberry Pi HatDrive Bottom.

Hailo-8 on Pi5 using Pineberry Pi HatDrive Bottom

# dmesg | grep hailo
[    5.169204] hailo: Init module. driver version 4.16.2
[    5.186413] hailo 0000:01:00.0: Probing on: 1e60:2864...
[    5.186424] hailo 0000:01:00.0: Probing: Allocate memory for device extension, 11592
[    5.186444] hailo 0000:01:00.0: enabling device (0000 -> 0002)
[    5.186451] hailo 0000:01:00.0: Probing: Device enabled
[    5.186471] hailo 0000:01:00.0: Probing: mapped bar 0 - 00000000128eb0c1 16384
[    5.186477] hailo 0000:01:00.0: Probing: mapped bar 2 - 000000003516a586 4096
[    5.186482] hailo 0000:01:00.0: Probing: mapped bar 4 - 00000000cf4c1ecb 16384
[    5.186487] hailo 0000:01:00.0: Probing: Setting max_desc_page_size to 16384, (page_size=16384)
[    5.186494] hailo 0000:01:00.0: Probing: Enabled 64 bit dma
[    5.186497] hailo 0000:01:00.0: Probing: Using userspace allocated vdma buffers
[    5.186501] hailo 0000:01:00.0: Disabling ASPM L0s 
[    5.186505] hailo 0000:01:00.0: Successfully disabled ASPM L0s 
[    5.384172] hailo 0000:01:00.0: Firmware was loaded successfully
[    5.397205] hailo 0000:01:00.0: Probing: Added board 1e60-2864, /dev/hailo0

Before building libhailort you'll need to change MAX_DESC_PAGE_SIZE to 16384 in descriptor_list.hpp.

# hailortcli parse-hef yolov8n.hef 
Architecture HEF was compiled for: HAILO8
Network group name: yolov8n, Multi Context - Number of contexts: 2
    Network name: yolov8n/yolov8n
        VStream infos:
            Input  yolov8n/input_layer1 UINT8, NHWC(640x640x3)
            Output yolov8n/conv41 UINT8, FCR(80x80x64)
            Output yolov8n/conv42 UINT8, FCR(80x80x80)
            Output yolov8n/conv52 UINT8, FCR(40x40x64)
            Output yolov8n/conv53 UINT8, FCR(40x40x80)
            Output yolov8n/conv62 UINT8, FCR(20x20x64)
            Output yolov8n/conv63 UINT8, FCR(20x20x80)

# hailortcli benchmark yolov8n.hef 
Starting Measurements...
Measuring FPS in hw_only mode
Network yolov8n/yolov8n: 100% | 1710 | FPS: 113.99 | ETA: 00:00:00
Measuring FPS and Power in streaming mode
[HailoRT] [warning] Using the overcurrent protection dvm for power measurement will disable the overcurrent protection.
If only taking one measurement, the protection will resume automatically.
If doing continuous measurement, to enable overcurrent protection again you have to stop the power measurement on this dvm.
Network yolov8n/yolov8n: 100% | 1710 | FPS: 113.95 | ETA: 00:00:00
Measuring HW Latency
Network yolov8n/yolov8n: 100% | 1707 | HW Latency: 7.97 ms | ETA: 00:00:00

=======
Summary
=======
FPS     (hw_only)                 = 113.988
        (streaming)               = 113.955
Latency (hw)                      = 7.97256 ms
Device 0000:01:00.0:
  Power in streaming mode (average) = 1.21517 W
                          (max)     = 1.22032 W

JonLech avatar Mar 14 '24 23:03 JonLech

Oh wow, awesome! Thank you very much :) Is this the 2280 B+M key version? Which can be ordered at https://up-shop.org/hailo-m2-key.html ?

cyclux avatar Mar 15 '24 00:03 cyclux

I've got the Key M model. I wanted M because I may end up using the Hailo-8 on a platform where I can take advantage of 4 PCIe lanes (i.e. not the Pi5). The B+M model only supports 2 lanes.

# hailortcli fw-control identify
Executing on device: 0000:01:00.0
Identifying board
Control Protocol Version: 2
Firmware Version: 4.16.2 (release,app,extended context switch buffer)
Logger Version: 0
Board Name: Hailo-8
Device Architecture: HAILO8
Serial Number: XXXXXXXXXXXXXXXX
Part Number: HM218B1C2FAE
Product Name: HAILO-8 AI ACC M.2 M KEY MODULE EXT TEMP

JonLech avatar Mar 15 '24 01:03 JonLech

@JonLech - Awesome! Thanks for posting your results—I've updated the Hailo 8 module page on the site with a 'Yes' for Pi 5: https://pipci.jeffgeerling.com/cards_m2/hailo-8-ai-module.html (also updating tags on this issue to match...).

geerlingguy avatar Mar 15 '24 03:03 geerlingguy

@JonLech Great, thanks for the info! It seems acquiring Hailo (M.2) isn't straightforward. The documentation hints that not all software is freely accessible: [..] and each model is accompanied by a binary HEF file, fully supported within the Hailo toolchain and Application suite (accessible to registered users only) source . Does this mean registration / product inquiry is a must, or is there a way to purchase a unit from retail (like https://up-shop.org/hailo-m2-key.html) and still use custom models with the available software?

cyclux avatar Mar 15 '24 13:03 cyclux

@cyclux The Hailo Model Zoo repo has a lot of models already converted to HEF format that anyone can download.

However, if you have a custom model you want to run on the Hailo-8, then you will need convert it to HEF format first using the Hailo AI Software Suite (available for download on the Hailo website after creating an account). You'll need an x86_64 host with a NVIDIA GPU for this process. It's relatively straightforward.

In terms of integrating use of a HEF model into other software, that can be accomplished with the C/C++/Python APIs in their hailort repo (although docs & tutorials are only available on their website with an account).

JonLech avatar Mar 15 '24 17:03 JonLech

@JonLech Nice, Thanks a lot for your helpful info!

cyclux avatar Mar 17 '24 20:03 cyclux

In recent Raspberry Pi news, there is a company Velo AI which reports they are using a CM4 and Hailo chip, so things look positive as for it working on the CM4 too.

I am trying to get hold of a Hailo, when I do I will be able to confirm the specifics of setup on the CM4.

swdee avatar Mar 18 '24 00:03 swdee

I received the Hailo-8 A+E Key card and confirm its fully functional on the CM4. It is very much plug and play without any dramas with the following process.

I used a Waveshare A->B key adaptor and M.2 to PCIe adaptor and mounted this in the CM4 Carrier board.

hailo-8-cm4

The CM4 has Raspberry Pi OS (Debian 11 - Bullseye) installed with Python v3.9.

Driver Installation

Install Linux kernel headers and dkms

apt install linux-headers dkms

Download .deb PCIe driver and HailoRT packages from here and install, during install select DKMS install.

dpkg -i hailort-pcie-driver_4.16.0_all.deb

Reboot

Check dmesg for successful driver loading

$ dmesg | grep -i hailo

[    4.501251] hailo_pci: loading out-of-tree module taints kernel.
[    4.599730] hailo: Init module. driver version 4.16.0
[    4.600451] hailo 0000:01:00.0: Probing on: 1e60:2864...
[    4.600476] hailo 0000:01:00.0: Probing: Allocate memory for device extension, 11592
[    4.600565] hailo 0000:01:00.0: enabling device (0000 -> 0002)
[    4.600591] hailo 0000:01:00.0: Probing: Device enabled
[    4.600666] hailo 0000:01:00.0: Probing: mapped bar 0 - 00000000f78ea782 16384
[    4.600693] hailo 0000:01:00.0: Probing: mapped bar 2 - 0000000097da2126 4096
[    4.600716] hailo 0000:01:00.0: Probing: mapped bar 4 - 0000000071b50fb9 16384
[    4.600737] hailo 0000:01:00.0: Probing: Setting max_desc_page_size to 4096, (page_size=4096)
[    4.600775] hailo 0000:01:00.0: Probing: Enabled 64 bit dma
[    4.600789] hailo 0000:01:00.0: Probing: Using userspace allocated vdma buffers
[    4.600808] hailo 0000:01:00.0: Disabling ASPM L0s 
[    4.600829] hailo 0000:01:00.0: Successfully disabled ASPM L0s 
[    4.772490] hailo 0000:01:00.0: Firmware was loaded successfully
[    4.809954] hailo 0000:01:00.0: Probing: Added board 1e60-2864, /dev/hailo0

Install Hailo Runtime

Install HailoRT tools.

dpkg -i hailort_4.16.0_arm64.deb 

Check that device is recognised.

$ hailortcli scan

Hailo Devices:
[-] Device: 0000:01:00.0
$ hailortcli fw-control identify

Executing on device: 0000:01:00.0
Identifying board
Control Protocol Version: 2
Firmware Version: 4.16.0 (release,app,extended context switch buffer)
Logger Version: 0
Board Name: Hailo-8
Device Architecture: HAILO8
Serial Number: ***
Part Number: HM218B1C2KA
Product Name: HAILO-8 AI ACCELERATOR M.2 A+E KEY MODULE

Python API Installation

Download pyHailoRT libraries by getting the python wheel file from here

Install Python pip

apt install python3-pip

Install pyHailoRT wheel file.

pip install hailort-4.16.0-cp39-cp39-linux_aarch64.whl

Run Example Inference

Get example code.

git clone https://github.com/hailo-ai/Hailo-Application-Code-Examples.git hailo-examples

Run inference code example which uses resnet50

cd hailo-examples/runtime/python/inference_run_opencv_classification/
./get_sources.sh

Install python requirements (if need)

pip install -r requirements.txt

Run inference example

./run_inference_with_image_opencv.py resnet_v1_50.hef zidane.jpg --labels imagenet1000_clsidx_to_labels.txt

Product Comparison

I have benchmarks applicable to my usage across a number of devices which consists of image classification using an EfficientNet-Lite0 model. The following table provides a comparison of the performance.

Device First Inference Second Inference
Jetson Orin Nano 8GB - CUDA 3-4 sec 14-18ms
Jetson Orin Nano 8GB - CPU N/A 30ms
Raspberry Pi 4B 150ms 92ms
Raspberry Pi 5 67ms 50ms
Khadas VIM3 Pro 106ms 78ms
Rock Pi 5B - CPU 65-70ms 44ms
Rock Pi 5B - NPU (Single Core) 12ms 6-7ms
Rock Pi 5B - NPU (3 Cores, 9 Threads) N/A 1.64ms
Raspberry Pi CM4 with Hailo-8 (Blocking API) 11ms 4.2ms
Raspberry Pi CM4 with Hailo-8 (Streaming API) N/A 1.2ms
Threadripper Workstation - USB3 Coral 9-11ms
Raspberry Pi CM4 - USB2 Coral 20-27ms
Raspberry Pi 5 - USB2 Coral 20-24ms
Raspberry Pi 5 - USB3 Coral 9-12ms
Raspberry Pi 4B - USB2 Coral 20-27ms
Raspberry Pi 4B - USB3 Coral 11-18ms

The Hailo-8 Blocking API runs the image inference serially and provides a direct comparison to how inference is run on all the other platforms. The Hailo-8 device is also highly parallel and performs best using the Streaming API.

The Google Coral is a USB device and performs differently depending on if its connected to a USB2 or USB3 port.

swdee avatar Apr 03 '24 21:04 swdee