depthai-hardware icon indicating copy to clipboard operation
depthai-hardware copied to clipboard

OAK-D Pro

Open Luxonis-Brandon opened this issue 3 years ago β€’ 46 comments

Preorders available: OAK-D-Pro

Start with the why:

1. Mechanical design.

The mechanical design of OAK-D is limiting, with the following draw-backs:

  • Only mounting is a single 1/4-20 tripod mount
  • Lack of 2-screw solution for secure panel mounting (which can result in unintended rotation if additional mechanical design isn't done to prevent this)
  • Bigger than necessary
  • Heavier than necessary

2. Active Illumination

OAK-D was architected for applications where passive depth performs well (and can often outperform active depth; as IR from the sun can be blocked by the optics when purely-passive disparity depth is used).

There are many applications however where active depth is absolutely necessary (operation in no light, with big/blank surfaces, etc.). And there are many OAK-D customers who would like to use the power of OAK-D in these scenarios where purely-passive depth is prohibitive.

Move to the how:

  • Add a VESA-spec (7.5cm, M4) set of mounting holes to the back of the enclosure, moving the tripod-mount to the bottom.
  • Retool the image sensors and layout to minimize size and weight while improving thermal performance.
  • Add IR laser dot projection (for no-light depth) and IR LED blanket illumination (for no-light computer vision/perception).

Architect the IR laser and IR LED such that all these modes of operation are supported:

The idea is that they'd be used in one of these permutations:

  1. IR laser on for all frames
  2. IR LED on for all frames
  3. IR laser on for even frames, odd frames no illumination (ambient light)
  4. IR LED on for even frames, odd frames no illumination (ambient light)
  5. IR laser on for even frames, IR LED on for odd frames
  6. IR laser and IR LED on for all frames
  7. IR laser and IR LED both off for all frames.

It is likely that mode 1 and 5 will be used the most. But enabling all the permutations above allow maximum flexibility for adapting illumination to an application (including dynamically). For example Mode 6 will likely rarely be used, but there are certain cases where having both on may be beneficial.

Move to the what:

Same image sensors and FOV of OAK-D:

  • 2x OV9282 (but IR capable)
  • 1x IMX378 fixed-focus
  • IR laser dot projector
  • IR LED
  • Small, easier-to-integrate form-factor like below: image image

Luxonis-Brandon avatar Aug 21 '21 21:08 Luxonis-Brandon

It's here! And everything works! image

Luxonis-Brandon avatar Sep 22 '21 18:09 Luxonis-Brandon

What "IR laser dot projector" will you be using?

0ut5ider avatar Sep 23 '21 18:09 0ut5ider

BELAGO 1.1: https://ams.com/belago1.1

Luxonis-Brandon avatar Sep 23 '21 18:09 Luxonis-Brandon

Good choice, i tested it few days ago. Looks good.

ghost avatar Sep 26 '21 13:09 ghost

A large set of my customers need active depth. My customers had active depth since 2013 with Kinect v1, v2, and now v3. I have started work to update my solutions to use OAK. The gaps are two areas (active depth, good multi-human pose). PRO model will close the first gap. The latter is outstanding to robustly support multi at 30fps.

My customers often create things/solutions in low light->dark environments for music/art/museum/tradeshow/interactive installations, research, coursework, and innovative flashy-blinky-shiny creations that crave the dark. Historically, Kinect was the goto but Microsoft's 3rd revision has no code progress+no hw purchasable for ~1.5 years. Realsense was an option for a few years but Intel has recently deprioritized/exited the depth market. There are a few smaller players but lack progress -or- outside cost budgets.

I am very interested in OAK-D-PRO.

diablodale avatar Oct 07 '21 12:10 diablodale

Thanks @diablodale ! We're quite excited to get these out.

My customers often create things/solutions in low light->dark environments for music/art/museum/tradeshow/interactive installations, research, coursework, and innovative flashy-blinky-shiny creations that crave the dark.

Yes ^. We were just discussing this offline and we figured that a lot of the artistic installations that it is desirable and/or common for there to be low-light or almost no light otherwise. To make the exhibit captivating.

Good multi-human pose... is outstanding to robustly support multi at 30fps.

CC: @tersekmatija on this. In case he's seen anything.

Thanks again, Brandon

Luxonis-Brandon avatar Oct 07 '21 15:10 Luxonis-Brandon

Hey, so right now I think we have a few single pose models, not sure how we stand with multi-human pose models, but it's definitely a possibility. We actually have a few community members that are active in this field. If you want, you can check out our Discord (we have a dedicated #human-pose-estimation channel), as there are quite a few good resources there I think :)

Link: https://discord.gg/zN5CkquJtD

tersekmatija avatar Oct 08 '21 06:10 tersekmatija

Are you designing OAK-D-PRO hardware to deal with peer sensor interference? With both emitted dot patterns, and TOF methods...is interference. A minority of my customers will use multiple sensors with their FOV overlapping. They do this to greatly increase FOV by merging, to fill in occulusions, or to surround objects (like with 3 sensors) to merge depth/pointclouds to create 360 views.

And in all of these...will be interference with sensors. There have been hacks with dot pattern sensors...attaching tiny vibrating motors directly to the sensors which somehow leads to a sensor more succesfully "seeing" only its own dots. With TOF, I haven't seen hacks and instead the use of sync signals sent between sensors on wire. Sync signals are a widely used/known thing...just pulse it and synchro-bingo-bango everyone has a shared clock and now an API can control offsets so no TOF conflicts with another.

diablodale avatar Oct 09 '21 23:10 diablodale

Are you designing OAK-D-PRO hardware to deal with peer sensor interference?

Not explicitly. That said our multi-camera sync is likely accurate enough that an inherited solution already exists and is likely to be sufficient. At least for active disparity depth - so for OAK-D-Pro. (Not sure on ToF. Will need investigation.)

Luxonis-Brandon avatar Oct 11 '21 16:10 Luxonis-Brandon

I recommed you try it now in harware prototyping to see/know the behavior (and chose to not fix it), than be surprised after manufacturing and react. Get three sensor and setup at least these two scenarios image

In all dot cameras of which I have experience and/or read, the interference is substantial.

What multi-camera sync? The only thing I've seen is an attempt to match frames by timestamp. https://github.com/luxonis/depthai-experiments/blob/master/gen2-deeplabv3_depth/main.py#L66-L84 Such a solution does nothing for interference. The data is corrupt in the depth emitter/camera. It is exactly like the dot emitter failing to draw dot correctly. Without correct dots, everything breaks down.

This isn't a showstopper for me. Rather, a hardware feature to consider when 2+ active sensor have overlapping FOV.

diablodale avatar Oct 11 '21 17:10 diablodale

Like https://www.cs.unc.edu/~maimone/media/kinect_VR_2012.pdf

diablodale avatar Oct 11 '21 17:10 diablodale

@diablodale I'm interested in how that might occur. With structured light or Lidar, yes of course, but overlapping illumination patterns in stereo systems are expected to improve the resulting depth map, not degrade it. Interested, Michael

michaelkr avatar Oct 11 '21 17:10 michaelkr

I'm interested also. In the left scenario I draw above, there will be a field of 3x the number of dots all in the same FOV and none of the three OAKs knows there is another OAK...or two other OAKs.

This field of dots is not readily consistent. The 3 emitters are not at equal distance from surfaces. And not at equal angles. Therefore, the dots in a single set change their relative distance between its own set...and relative to the other two sets. All from the perspective of each of the three (or 6) cameras. Creating complex MoirΓ© patterns.

If an OAK knew there were 2 other cameras...perhaps it could somehow identifing and isolating its own dots and then somehow isolating the other dots. And then ignore those other dots. That seems a lot of work and code to me. Nothing I've seen any depth sensor ever do to date.

diablodale avatar Oct 11 '21 18:10 diablodale

I don't believe those things should matter - stereo-based systems stereo don't need to associate projected patterns with an individual sensor (or any sensor at all, for that matter). The dots are there just to provide texture in the scene in situations where there is none (uniformly-painted walls, for eg), so that stereo matching can occur - but it does not matter from what source. For example, the Intel RealSense stereo cameras (not their structured light cameras) are of a similar design and can overlap without issue.

michaelkr avatar Oct 11 '21 18:10 michaelkr

@diablodale I'm interested in how that might occur. With structured light or Lidar, yes of course, but overlapping illumination patterns in stereo systems are expected to improve the resulting depth map, not degrade it. Interested, Michael

Yes. That is what I've observed as well. A key premise of active stereo depth that's necessary to understand in this conversation is that laser dot projectors simply add information to the scene. And so when having multiple OAK-D-Pro, and so overlapping projectors, there is simply more information added to the scene. And thus the stereo depth performance from all cameras actually does improve the scene. And because of the realistic placement of devices, it's (practically-speaking) impossible to result in information being removed from the scene. We have tested this and confirmed it as well in experimental settings.

So @diablodale - the ir laser dot projectors can be thought of like a can of spray paint. But instead of having to go spray-paint texture onto all your blank walls, physically. The IR laser dot projectors do this with IR - adding visual texture, visual interest - to the scene. Making blank walls and blank surfaces have a visual interest - and features for feature matching.

Below is an example that shows how/why this is needed for stereo depth: image

Notice that you can't even tell that there are two images when looking at the back white wall there.

The laser dot projector literally just adds texture to the wall. So that then you would be able to see that there are two images. You'd be able to match them.

And here is syncing multiple cameras: https://github.com/luxonis/depthai-experiments/tree/master/gen2-seq-num-sync#sync-multiple-devices

Notice that the grayscale-sync is what is necessary here. And at least in that example the grayscale sensors (and their triggering of IR laser projection) are in-sync as at least as far as the timer on the screen's granularity. Which is milliseconds.

So the multi-camera-sync in this case is able to get the two cameras (4 grayscale sensors) to within 1 millisecond of each-other. And note that the color camera does not have hardware sync, and is rolling shutter, so it is within 10 milliseconds.

image

When I mentioned syncing multiple active-stereo depth cameras. I was mentioning synching the emitters so they are active during the same time. The above may be good enough. But we have not explicitly tested. As @michaelkr mentioned, the worst-case here is that the depth quality actually remains the same as if only a single camera were used. But most likely, when multiple cameras are used, the overlapping of the textures they are super-imposing will improve the depth quality for both.

Thanks, Brandon

Luxonis-Brandon avatar Oct 11 '21 18:10 Luxonis-Brandon

I don't believe those things should matter - stereo-based systems stereo don't need to associate projected patterns with an individual sensor (or any sensor at all, for that matter). The dots are there just to provide texture in the scene in situations where there is none (uniformly-painted walls, for eg), so that stereo matching can occur - but it does not matter from what source. For example, the Intel RealSense stereo cameras (not their structured light cameras) are of a similar design and can overlap without issue.

Well said. Agreed. Same observations here.

Luxonis-Brandon avatar Oct 11 '21 18:10 Luxonis-Brandon

Thanks, I now get the dots interference thinking πŸ‘ This was a key distinction for me to get between how depth data will be derived with OAK-D-Pro. I had not considered that the stereo disparity approach would continue to be used with active emission rather than the single-cam dot approach like primesense or earlier Kinect models.

The emitter API lists only 6 values in the OP. I request a 7th which is Laser=off/LED=off. ❌❌In that off/off I would hope OAK-D-Pro to generate results the same as OAK-D in "normal" lighting situations.

Sync across multiple OAKs

From what I can see from sync and timestamps in DepthAI, the sync method used in the multicam sync at https://github.com/luxonis/depthai-experiments/tree/master/gen2-seq-num-sync#sync-multiple-devices uses a host CPU timestamp placed in the ImgFrame via https://github.com/luxonis/depthai-core/blob/7d76a830ffc51512adae455ec28b1150eabec513/src/pipeline/datatype/ImgFrame.cpp#L11-L13

Variations/delays that occur within the OAK sensor hardware to collect enough photons via exposure time, calculate disparity or debayer color, USB wire latency, USB controller, PCIe, OS, OAK driver, and finally DepthAI SDK will cause jitter and/or packets to arrive in slightly different times. The latency between the photons hitting the sensor -> DepthAI SDK on line 11 above will vary slightly on every packet.

I see there is getTimestampDevice() which likely has access to a monotonic clock/timestamp value applied by an OAK within the OAK hardware to the original data itself. Hopefully just after the photons are collected. True? https://github.com/luxonis/depthai-core/blob/7d76a830ffc51512adae455ec28b1150eabec513/src/pipeline/datatype/ImgFrame.cpp#L24 and https://github.com/luxonis/depthai-core/pull/174 That PR work suggests that the timestamps between the three cameras on a single OAK are being coordinated. But not that timestamps across different OAKs are being coordinated.

If that device-side monotonic clock was hardware synchronized across OAKs, then tight frame sync could be achieved across sensors...even sensors on different computers (by using PTP or very high precision NTP). In absence of hardware sync, the monotonic clock on different sensors will not themselves be the same...and hardware clocks always drift...resulting in the same variation challenge.

What's possible? πŸ€” If the hardware design of the first OAK-D-PRO will not have a cable/hardware clock sync, then what can the DepthAI SDK do to calculate latency and/or assist in clock sync? Perhaps consider how prosumer analog camera flashes work. One flash is the "master" and sends out an early burst of light which all the other flashes see. Then all the flashes have a starting point and know how long to wait and then all flash together with very high precision.

Could there be something emitted by the laser on one OAK which acts as a sync seen by the other OAKs? This is most likely a hardware/firmware feature. Perhaps that is done at startup and then all device monoclocks can be set to zero. Which then having zero in getTimestampDevice() an offset to host UTC time can be established and thereafter timestamps coordinated and drift soft-corrected.

diablodale avatar Oct 11 '21 23:10 diablodale

Is there any chance you can center the laser dot projector on the device? Maybe it's just me, but the asymmetric design doesn't seem super elegant. It would be awesome if you could move the projector right below the color camera!

nalzok avatar Oct 12 '21 05:10 nalzok

I don't believe those things should matter - stereo-based systems stereo don't need to associate projected patterns with an individual sensor (or any sensor at all, for that matter). The dots are there just to provide texture in the scene in situations where there is none (uniformly-painted walls, for eg), so that stereo matching can occur - but it does not matter from what source. For example, the Intel RealSense stereo cameras (not their structured light cameras) are of a similar design and can overlap without issue.

Same observation here , we have been using d435s with overlapping FoV for a while with no issues. To my understanding, as long as the patterns are sharp enough, they are adding more textures for the stereo matching. The only way I see it could have an adverse effect is if you have enough patterns projected to saturate the camera (or to re-use the spray analogy, if you have painted completely and uniformly your wall).

doisyg avatar Oct 12 '21 06:10 doisyg

Is there any chance you can center the laser dot projector on the device? Maybe it's just me, but the asymmetric design doesn't seem super elegant. It would be awesome if you could move the projector right below the color camera!

Unfortunately no. There's no room in the center: image

But more importantly, the design is thermally-symmetric. Which is the most important part. The heat-generators are intentionally located and implemented in specific locations to maximize thermal dissipation efficiency. And moving the IR laser, and IR LED would imbalance this. Reducing overall performance because of higher ambient operating temperatures as a result of high leakage current in the main IC.

Luxonis-Brandon avatar Oct 12 '21 22:10 Luxonis-Brandon

The emitter API lists only 6 values in the OP. I request a 7th which is Laser=off/LED=off. ❌❌In that off/off I would hope OAK-D-Pro to generate results the same as OAK-D in "normal" lighting situations.

Yes. I should have had that on there. It is indeed already a planned mode.

Luxonis-Brandon avatar Oct 13 '21 04:10 Luxonis-Brandon

What stereo matching algorithm You want to use? What limitation it has? Resolution/disparity range/fps ?

ghost avatar Oct 15 '21 12:10 ghost

So it's a long answer with a LOT of details/options - but they are all here: https://docs.luxonis.com/projects/api/en/latest/components/nodes/stereo_depth/. And feel free to let us know if anything you are looking for is not there.

Luxonis-Brandon avatar Oct 15 '21 13:10 Luxonis-Brandon

Got some initial synced testing going. The census transform and disparity depth parameters need some tuning to make the match work well w/ the projector in the scene. But you can see the projector is EXTREMELY visible, which is the main purpose of the test.
image image image image

Luxonis-Brandon avatar Oct 19 '21 17:10 Luxonis-Brandon

Thanks, I can infer a lot from these. Some things that come to mind...

  • A super small set of people might want an API to control the power of the Flood and Laser as you have in your test app. It adds some flexibity to manage the balance of photons/exposuretime/iso/aperature. We don't have aperature, so we are currently limited to iso (grainy or not) and exposuretime (blur or not). Naturally, there are power/heat considerations, ?regional regulations?,...and maybe over-tweakability.
  • I can see artifacts in the disparity data which relate to the laser pattern. This will be a new considerion to manage.
  • I understand this is early code and you caveat. πŸ‘ I'm hoping that later iterations of laser have same/more success than flood-only.
  • When your team is ready, interested to see 2-3 overlapping lasers. In these pics, a single laser dot is quite large. I'm curious if overlapping lasers will merge dots to create fields of light resulting in unmatchable sameness like see in the flood-only right wall.
  • Yikes 😬 1.1A for the laser

diablodale avatar Oct 19 '21 18:10 diablodale

Thanks.

A super small set of people might want an API to control the power of the Flood and Laser as you have in your test app. It adds some flexibity to manage the balance of photons/exposuretime/iso/aperature. We don't have aperature, so we are currently limited to iso (grainy or not) and exposuretime (blur or not). Naturally, there are power/heat considerations, ?regional regulations?,...and maybe over-tweakability.

Yes. Planned. Will be available in API. We are intrinsically eye-safe, so no matter what parameter is tweaked, the device cannot be made unsafe. We determined this today. Another way to put this is: The hardware is incapable of driving the laser at a high-enough power that it would become unsafe to eyes.

I can see artifacts in the disparity data which relate to the laser pattern. This will be a new considerion to manage.

Agreed. I meant to mention those specifically. We are adding the capability for anyone to fine-tune (at run-time) all the internals of the depth pipeline. https://github.com/luxonis/depthai-python/pull/377

And we will do this ourselves and provide defaults that work well for OAK-D Pro.

I understand this is early code and you caveat. πŸ‘ I'm hoping that later iterations of laser have same/more success than flood-only.

Agreed. It will. Also these cameras are hot-glued into the prototype currently for being able to swap in/out easy. So the tests are with a grain of salt - as the calibration fades really really easily as the device heats up and the hot glue becomes malleable. So the possibility exists that the artifacts are just because of that (as the laser dot projector and IR LED produce more heat).

This of course will not be an issue for production units.

When your team is ready, interested to see 2-3 overlapping lasers. In these pics, a single laser dot is quite large. I'm curious if overlapping lasers will merge dots to create fields of light resulting in unmatchable sameness like see in the flood-only right wall.

Yes. And also reducing the laser power will actually result in smaller dots. Which can help for tuning in with multiple over-lapping cameras.

Luxonis-Brandon avatar Oct 20 '21 16:10 Luxonis-Brandon

IR LED & Laser Pulsing https://youtu.be/SWDQekolM8o

This shows the exposure time of the OV9282 IR-capable global shutter grayscale cameras (yellow) and the IR LED/IR laser output (blue).

Luxonis-Brandon avatar Oct 20 '21 16:10 Luxonis-Brandon

@Luxonis-Brandon - This looks fantastic! Super excited for the release. Any estimates on the targeted ship date for the hardware?

KySmith1 avatar Oct 25 '21 03:10 KySmith1

Sorry @KySmith1 I missed this. Likely February 2022 but just a guess right now.

Luxonis-Brandon avatar Nov 10 '21 22:11 Luxonis-Brandon

It's here! And everything works! image

What is the width dimension of this (including the enclosure and without the enclosure)?

gluxkind-k avatar Nov 23 '21 20:11 gluxkind-k