MiDaS icon indicating copy to clipboard operation
MiDaS copied to clipboard

Midas depth range ?

Open phongnhhn92 opened this issue 3 years ago • 9 comments

Hi, I am new to Midas. Can I ask what is the depth range of the predicted depth map. Is this [0,1] ?

phongnhhn92 avatar Jan 28 '21 19:01 phongnhhn92

no, you should be pulling the min and max from the output and normalizing the data into a GrayScale space for instance based off of that.

riteshpakala avatar Mar 17 '21 22:03 riteshpakala

@riteshpakala so what is the unit of the raw depth prediction? Is it inverted depth or just the depth?

elenacliu avatar Aug 01 '23 12:08 elenacliu

@elenacliu it should be inverted, 1 = new and 0 = far

riteshpakala avatar Aug 01 '23 17:08 riteshpakala

@riteshpakala I have printed out the value of the predicted depth, and found it is not in the range [0,1], but also satisfies the property that the smaller the farther, the bigger the nearer.

elenacliu avatar Aug 02 '23 02:08 elenacliu

@elenacliu oh I see, I may be confusing it with another depth model. is it possible that it is still in the range of 0-255 and needs to normalize in post, value/255

Edit: I found some old code and yeah I was taking the min and max of the output to normalize to [0,1]. this is pretty costly, for realtime though

riteshpakala avatar Aug 02 '23 02:08 riteshpakala

The output is like this:

depth_map: [[ 2320.3528 2317.7908 2311.3635 ... 987.1105 834.85095 765.7877 ] [ 2317.6309 2315.899 2311.3477 ... 1015.4411 889.8536 833.15 ] [ 2310.6614 2310.7517 2310.259 ... 1078.0278 1009.96857 980.0071 ] ... [10098.441 10137.251 10221.653 ... 9858.545 9855.353 9854.753 ] [ 9902.26 9975.874 10136.235 ... 9838.973 9833.239 9830.733 ] [ 9814.838 9903.92 10097.963 ... 9830.934 9822.333 9818.274 ]]

@riteshpakala

elenacliu avatar Aug 02 '23 02:08 elenacliu

The range just confuses me, and I have found a code clip which processes the normal map https://github.com/graemeniedermayer/stable-diffusion-webui-normalmap-script/blob/main/scripts/normalmap.py#L285

# output
normal = prediction
numbytes=2
normal_min = normal.min()
normal_max = normal.max()
max_val = (2**(8*numbytes))-1

# check output before normalizing and mapping to 16 bit
if normal_max - normal_min > np.finfo("float").eps:
	out = max_val * (normal - normal_min) / (normal_max - normal_min)
else:
	out = np.zeros(normal.shape)

# single channel, 16 bit image
img_output = out.astype("uint16")

# invert normal map
if not (invert_normal ^ model_type == 0):
	img_output = cv2.bitwise_not(img_output)

img_output = (scale_depth * img_output).astype("uint16")

# three channel, 8 bits per channel image
img_output2 = np.zeros_like(processed.images[count])
img_output2[:,:,0] = img_output / 256.0
img_output2[:,:,1] = img_output / 256.0
img_output2[:,:,2] = img_output / 256.0

#pre blur (only blurs z-axis)
if pre_gaussian_blur:
	img_output = cv2.GaussianBlur(img_output, (pre_gaussian_blur_kernel, pre_gaussian_blur_kernel), pre_gaussian_blur_kernel)

# take gradients 
if sobel_gradient:
	zx = cv2.Sobel(np.float64(img_output), cv2.CV_64F, 1, 0, ksize=sobel_kernel)     
	zy = cv2.Sobel(np.float64(img_output), cv2.CV_64F, 0, 1, ksize=sobel_kernel) 
else:
	zy, zx = np.gradient(img_output)

# combine and normalize gradients.
normal = np.dstack((zx, -zy, np.ones_like(img_output)))
n = np.linalg.norm(normal, axis=2)
normal[:, :, 0] /= n
normal[:, :, 1] /= n
normal[:, :, 2] /= n

# post blur (will break normal maps unitary values)
if post_gaussian_blur:
	normal = cv2.GaussianBlur(normal, (post_gaussian_blur_kernel, post_gaussian_blur_kernel), post_gaussian_blur_kernel)

# offset and rescale values to be in 0-255
normal += 1
normal /= 2
normal *= 255	
normal = normal.astype(np.uint8)

It seems that the depth doesn't have a constant range.

elenacliu avatar Aug 02 '23 02:08 elenacliu

@elenacliu oh wait, I was referring to another thread actually. Ignore that (now deleted) comment.

What is the Colorspace of your input image? Is it BGR or RGB? I am just thinking if that was a possible edge case I experienced when I was seeing larger numbers

riteshpakala avatar Aug 02 '23 02:08 riteshpakala

you mean the image that I gave MiDas to predict the depth image? I just run

python run.py --model_type dpt_beit_large_512 

as the README.md instructs.

elenacliu avatar Aug 02 '23 02:08 elenacliu