TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10
TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10 copied to clipboard
Extracting coordinates instead of drawing a box
EDIT: Nevermind, got it to work!
Hey, first off, great tutorial, thank you so much.
I got it to run on ubuntu 16.04 as well with ease but I have a problem. I'm running on a CLI Ubuntu server, so instead of using an image as output, I'd just like to have the coordinates of the boxes.
I looked into the Object_detection_image.py and found where the boxes are being drawn, but it uses a function named visualize_boxes_and_labels_on_image_array to draw them. If I try to ouput the np.squeeze(boxes), it returns this:
[[0.5897823 0.35585764 0.87036747 0.5124078 ]
[0.6508235 0.13419046 0.85757935 0.2114587 ]
[0.64070517 0.14992228 0.8580698 0.23488007]
...
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]
[0. 0. 0. 0. ]]
Is there a way to just get the coordinates from that?
Thank you for your time!
EDIT:
Okay, I added a new function to the visualization_utils.py that returns the "ymin, ymax, xmin, xmax" variables, used in other functions of that file to draw the boxes.
The problem is, they look like this:
[[0.5897822976112366, 0.8703674674034119, 0.35585764050483704, 0.5124077796936035], [0.6508234739303589, 0.8575793504714966, 0.13419045507907867, 0.2114586979150772]]
I was expecting coordinates. These seem like percentages.
EDIT: Okay, I got it to work.
I'm facing the same problem. Do you have a solution for this? Mind to share?
I found a solution. I'll share it on here tomorrow, when I'm at work (don't have the solution at home).
add this to the utils/visualization_utils.py
def return_coordinates(
image,
boxes,
classes,
scores,
category_index,
instance_masks=None,
instance_boundaries=None,
keypoints=None,
use_normalized_coordinates=False,
max_boxes_to_draw=20,
min_score_thresh=.5,
agnostic_mode=False,
line_thickness=4,
groundtruth_box_visualization_color='black',
skip_scores=False,
skip_labels=False):
# Create a display string (and color) for every box location, group any boxes
# that correspond to the same location.
box_to_display_str_map = collections.defaultdict(list)
box_to_color_map = collections.defaultdict(str)
box_to_instance_masks_map = {}
box_to_instance_boundaries_map = {}
box_to_score_map = {}
box_to_keypoints_map = collections.defaultdict(list)
if not max_boxes_to_draw:
max_boxes_to_draw = boxes.shape[0]
for i in range(min(max_boxes_to_draw, boxes.shape[0])):
if scores is None or scores[i] > min_score_thresh:
box = tuple(boxes[i].tolist())
if instance_masks is not None:
box_to_instance_masks_map[box] = instance_masks[i]
if instance_boundaries is not None:
box_to_instance_boundaries_map[box] = instance_boundaries[i]
if keypoints is not None:
box_to_keypoints_map[box].extend(keypoints[i])
if scores is None:
box_to_color_map[box] = groundtruth_box_visualization_color
else:
display_str = ''
if not skip_labels:
if not agnostic_mode:
if classes[i] in category_index.keys():
class_name = category_index[classes[i]]['name']
else:
class_name = 'N/A'
display_str = str(class_name)
if not skip_scores:
if not display_str:
display_str = '{}%'.format(int(100*scores[i]))
else:
display_str = '{}: {}%'.format(display_str, int(100*scores[i]))
box_to_display_str_map[box].append(display_str)
box_to_score_map[box] = scores[i]
if agnostic_mode:
box_to_color_map[box] = 'DarkOrange'
else:
box_to_color_map[box] = STANDARD_COLORS[
classes[i] % len(STANDARD_COLORS)]
# Draw all boxes onto image.
coordinates_list = []
counter_for = 0
for box, color in box_to_color_map.items():
ymin, xmin, ymax, xmax = box
height, width, channels = image.shape
ymin = int(ymin*height)
ymax = int(ymax*height)
xmin = int(xmin*width)
xmax = int(xmax*width)
coordinates_list.append([ymin, ymax, xmin, xmax, (box_to_score_map[box]*100)])
counter_for = counter_for + 1
return coordinates_list
add this to Object_detection_dir.py
coordinates = vis_util.return_coordinates(
image,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=8,
min_score_thresh=0.80)
as well as this:
textfile = open("json/"+filename_string+".json", "a")
textfile.write(json.dumps(coordinates))
textfile.write("\n")
I think this should be all.
This was very helpful. thank you so much. If anyone needs to access each coordinate separately change the 3rd last line in the newly added code to utils/visualization_utils.py which is
coordinates_list.append([ymin, ymax, xmin, xmax, (box_to_score_map[box]*100)])
into coordinates_list=[ymin, ymax, xmin, xmax, (box_to_score_map[box]*100)]
and you can access each ymin, ymax , xmin, xmax values separately using ymin=coordinate_list[0] etc. in your object detection file.
@PraveenNellihela I have been getting an error of IndexError: list index out of range
Below I attached the codings,maybe I missed one of your points,Do kindly assist.
coordinates = vis_util.return_coordinates( frame, np.squeeze(boxes), np.squeeze(classes).astype(np.int32), np.squeeze(scores), category_index, use_normalized_coordinates=True, line_thickness=8, min_score_thresh=0.85) ymin=int(coordinates[0]) ymax=int(coordinates[1]) xmin=int(coordinates[2]) xmax=int(coordinates[3])
@PraveenNellihela I have been getting an error of IndexError: list index out of range Below I attached the codings,maybe I missed one of your points,Do kindly assist.
coordinates = vis_util.return_coordinates( frame, np.squeeze(boxes), np.squeeze(classes).astype(np.int32), np.squeeze(scores), category_index, use_normalized_coordinates=True, line_thickness=8, min_score_thresh=0.85) ymin=int(coordinates[0]) ymax=int(coordinates[1]) xmin=int(coordinates[2]) xmax=int(coordinates[3])
Try using some form of error handling such as "try and except". You are most likely getting this error when there are no detections. I think I got the same issue when I was doing this, when the object I was trying to detect went out of the video frame. I used "try and except" to ignore the values when there werent any so it didnt produce an error. Hope this helps.
@PraveenNellihela Thank you for the suggestion,it worked flawlessly. Best of luck in life.
@iqrammm Can you share the code that you wrote? like how to make "try and except" in this situation thanks a lot :)
try: coordinates = vis_util.return_coordinates( frame, np.squeeze(boxes), np.squeeze(classes).astype(np.int32), np.squeeze(scores), category_index, use_normalized_coordinates=True, line_thickness=10, min_score_thresh=0.85) ymin=int(coordinates[0]) ymax=int(coordinates[1]) xmin=int(coordinates[2]) xmax=int(coordinates[3]) except: pass
On Mon, Jul 8, 2019 at 1:47 PM KwonJoo [email protected] wrote:
@iqrammm https://github.com/iqrammm Can you share the code that you wrote? like how to make "try and except" in this situation thanks a lot :)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10/issues/69?email_source=notifications&email_token=ALTHKYXAWREYWALRAG5WBW3P6LIGRA5CNFSM4FFF37G2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZMA5ZY#issuecomment-509087463, or mute the thread https://github.com/notifications/unsubscribe-auth/ALTHKYUBUOIU34JTEPMM77LP6LIGRANCNFSM4FFF37GQ .
Thanks a lot :) I couldn't solve this problems until yesterday. You made my day, sir! Have a good day and May I email you when I get in trouble?
-----Original Message----- From: "iqrammm"[email protected] To: "EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10"TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10@noreply.github.com; Cc: "KwonJoo"[email protected]; "Comment"[email protected]; Sent: 2019-07-08 (월) 19:51:07 (GMT+09:00) Subject: Re: [EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10] Extracting coordinates instead of drawing a box (#69)
try: coordinates = vis_util.return_coordinates( frame, np.squeeze(boxes), np.squeeze(classes).astype(np.int32), np.squeeze(scores), category_index, use_normalized_coordinates=True, line_thickness=10, min_score_thresh=0.85) ymin=int(coordinates[0]) ymax=int(coordinates[1]) xmin=int(coordinates[2]) xmax=int(coordinates[3]) except: pass
On Mon, Jul 8, 2019 at 1:47 PM KwonJoo [email protected] wrote:
@iqrammm https://github.com/iqrammm Can you share the code that you wrote? like how to make "try and except" in this situation thanks a lot :)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10/issues/69?email_source=notifications&email_token=ALTHKYXAWREYWALRAG5WBW3P6LIGRA5CNFSM4FFF37G2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZMA5ZY#issuecomment-509087463, or mute the thread https://github.com/notifications/unsubscribe-auth/ALTHKYUBUOIU34JTEPMM77LP6LIGRANCNFSM4FFF37GQ .
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
Sure but I may not be able to answer as I am also pretty new to tensorflow and dnn
On Tue, Jul 9, 2019 at 8:36 AM KwonJoo [email protected] wrote:
Thanks a lot :) I couldn't solve this problems until yesterday. You made my day, sir! Have a good day and May I email you when I get in trouble?
-----Original Message----- From: "iqrammm"[email protected] To: "EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10"< TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10@noreply.github.com
; Cc: "KwonJoo"[email protected]; "Comment"< [email protected]>; Sent: 2019-07-08 (월) 19:51:07 (GMT+09:00) Subject: Re: [EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10] Extracting coordinates instead of drawing a box (#69)
try: coordinates = vis_util.return_coordinates( frame, np.squeeze(boxes), np.squeeze(classes).astype(np.int32), np.squeeze(scores), category_index, use_normalized_coordinates=True, line_thickness=10, min_score_thresh=0.85) ymin=int(coordinates[0]) ymax=int(coordinates[1]) xmin=int(coordinates[2]) xmax=int(coordinates[3]) except: pass
On Mon, Jul 8, 2019 at 1:47 PM KwonJoo [email protected] wrote:
@iqrammm https://github.com/iqrammm Can you share the code that you wrote? like how to make "try and except" in this situation thanks a lot :)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10/issues/69?email_source=notifications&email_token=ALTHKYXAWREYWALRAG5WBW3P6LIGRA5CNFSM4FFF37G2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZMA5ZY#issuecomment-509087463 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ALTHKYUBUOIU34JTEPMM77LP6LIGRANCNFSM4FFF37GQ
.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10/issues/69?email_source=notifications&email_token=ALTHKYUJW3XDLYF3IW2RFTLP6PMRRA5CNFSM4FFF37G2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZOXSQQ#issuecomment-509442370, or mute the thread https://github.com/notifications/unsubscribe-auth/ALTHKYVBFYV7KJPOMSX5UWDP6PMRRANCNFSM4FFF37GQ .
@PraveenNellihela Do you by any chance know how to return the percentage scores as well? I added to return scores but cant seem to get it to work.
@PraveenNellihela Do you by any chance know how to return the percentage scores as well? I added to return scores but cant seem to get it to work.
Hey, the code I originally posted should already return the class that was detected and the accuracy in percent. You can get all 6 values from the return like this:
coordinates = vis_util.return_coordinates(
image,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=8,
min_score_thresh=0.80)
for coordinate in coordinates:
print(coordinate)
(y1, y2, x1, x2, accuracy, classification) = coordinate
With "accuracy" being the value you are looking for and "classification" the ID you associated with your object's class.
EDIT: I forgot that I edited my code to include the classification after I posted the initial code here. If you want the classification ID to be returned with the coordinates, change the last few lines of your return_coordinates function to the following:
coordinates_list = []
counter_for = 0
for box, color in box_to_color_map.items():
ymin, xmin, ymax, xmax = box
height, width, channels = image.shape
ymin = int(ymin*height)
ymax = int(ymax*height)
xmin = int(xmin*width)
xmax = int(xmax*width)
coordinates_list.append([ymin, ymax, xmin, xmax, (box_to_score_map[box]*100), int(class_name)])
counter_for = counter_for + 1
return coordinates_list
int(class_name) should contain the ID of the detected object.
EDIT 2: By the way, if anyone wants to not just extract the coordinates, but also crop the image, I recently implemented that into my program and it's just a few lines of code:
for coordinate in coordinates:
(y1, y2, x1, x2, acc, classification) = coordinate
height = y2-y1
width = x2-x1
crop = image[y1:y1+height, x1:x1+width]
cv2.imwrite("[PATH TO WHERE THE CROP SHOULD BE SAVED]")
@iqrammm in your case, with the changes you made to the return_coordinates function, it would be something like this:
coordinates = vis_util.return_coordinates(
frame,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=10,
min_score_thresh=0.85)
ymin=int(coordinates[0])
ymax=int(coordinates[1])
xmin=int(coordinates[2])
xmax=int(coordinates[3])
accuracy = float(coordinates[4])
I haven't tested the code above, but theoretically it should work like that, since the return_coordinates function returns a list with ymin, ymax, xmin, xmax, accuracy. The (box_to_score_map[box]*100) is the accuracy in percentage.
for coordinate in coordinates: print(coordinate) (y1, y2, x1, x2, accuracy, classification) = coordinate
I believe this not complete since class_name will return the last class that was assigned to the variable.
UPDATE: I figured out how to write the values to a text file by making some modifications in the last few lines of the object_detection.py file! Thank you
@psi43 I'm facing issues with these lines (writing the coordinates into the text file) textfile = open("json/"+filename_string+".json", "a") textfile.write(json.dumps(coordinates)) textfile.write("\n")
does the filename_string simply refer to an empty text file that I should create to save the outputs?
Thank you in advance, I really appreciate it!
for coordinate in coordinates: print(coordinate) (y1, y2, x1, x2, accuracy, classification) = coordinate
I believe this not complete since class_name will return the last class that was assigned to the variable.
I'll try to look into this. I use this code daily and have heavily modified it since (my example does work for me, but I might have missed a change). So me looking it up a few weeks ago might be very different to what I posted months ago.
UPDATE: I figured out how to write the values to a text file by making some modifications in the last few lines of the object_detection.py file! Thank you
@psi43 I'm facing issues with these lines (writing the coordinates into the text file) textfile = open("json/"+filename_string+".json", "a") textfile.write(json.dumps(coordinates)) textfile.write("\n")
does the filename_string simply refer to an empty text file that I should create to save the outputs?
Thank you in advance, I really appreciate it!
filename_string is a variable I used to write the coordinates to a json file. I was very new to python and was frustrated that building the filename out of strings and integers didn't work, so I had to do that and then cast the variable to a string. Hence the name. If you don't use a mix of ints and strings for your filename and just want to save it to one file, just define it as filename_string = "test" or something and the json should be saved as "test.json" in the /json/ directory.
I got it working. The piece that you have modified for your use case it is open source ?
I got it working. The piece that you have modified for your use case it is open source ?
Sadly no. Even if I planned on making it open source, I'd probably have to make a lot of changes, due to dumb variable names like "filename_string" and general bad code. It works, but I would feel bad having other look at it, thinking "what an idiot". Modifying this project was one of my first few encounters with python :/
Glad you got yours to work though! If you (or anyone else) has any more questions, I'll definitely try and answer them as best I can.
I got it working. The piece that you have modified for your use case it is open source ?
@SKY24 Can you please advise on how to fix the issue you pointed out earlier? 'class_name will return the last class that was assigned to the variable'
I got it working. The piece that you have modified for your use case it is open source ?
@SKY24 Can you please advise on how to fix the issue you pointed out earlier? 'class_name will return the last class that was assigned to the variable'
make the following change sand it should work
`if agnostic_mode: box_to_class_map[box] = classes[i] else: box_to_class_map[box] = classes[i]
# Draw all boxes onto image.
coordinates_list = []
counter_for = 0
for box, class_name in box_to_class_map.items():
ymin, xmin, ymax, xmax = box
height, width, channels = image.shape
ymin = int(ymin*height)
ymax = int(ymax*height)
xmin = int(xmin*width)
xmax = int(xmax*width)
data = {}
data['ymin'] = ymin
data['ymax'] = ymax
data['xmin'] = xmin
data['xmax'] = xmax
data['confidence'] = (box_to_score_map[box]*100)
data['className'] = int(class_name)
coordinates_list.append(data)
counter_for = counter_for + 1
return coordinates_list`
It`s an amazing thread for my work also.
I have another question regarding ### object_detection_image.py###.
After finishing the training session, I want to read a folder of image files and do detection issues on them but instead of showing them, the results are saved as image filename and detection score in a file (.csv maybe).
Could anyone help me with my problem?
Best,
Hello @psi43 ,
There is no Object_detection_dir.py, so where do i have to add this code? "add this to Object_detection_dir.py
coordinates = vis_util.return_coordinates( image, np.squeeze(boxes), np.squeeze(classes).astype(np.int32), np.squeeze(scores), category_index, use_normalized_coordinates=True, line_thickness=8, min_score_thresh=0.80) as well as this:
textfile = open("json/"+filename_string+".json", "a") textfile.write(json.dumps(coordinates)) textfile.write("\n")" Kindly awaiting for your reply
Hello @psi43 , I used the below code to crop the image> It works very well. ->
for coordinate in coordinates: (y1, y2, x1, x2, acc, classification) = coordinate height = y2-y1 width = x2-x1 crop = image[y1:y1+height, x1:x1+width] cv2.imwrite("[PATH TO WHERE THE CROP SHOULD BE SAVED]")
But there are multiple images of same label that i want to crop. The above code works for cropping only single image. please could you help in cropping and saving multiple images from a single image ?
Thanks and Regards. :)
@SinaMojtahedi In the example I gave for cropping, where I loop through the coordinates, you could replace that with writing the coordinates and other info into a .csv file. That was actually the natural progression of my project as well, instead of cropping one image, I now go through a directory of images and extract all the coordinates and save them in a .json file with the same name as the image. Just google for something like "python3 how to make csv files". Since you already have the data, all you need now is the saving part.
@hiteshreddy95 Sorry, I made an Object_detection_dir.py by basically putting a huge for-loop around the Object_detection_image.py so it would cycle through an entire directory of images. You can just add it to the Object_detection_image.py.
@akshay-bahulikar With that for-loop, it should crop out every object. Keep in mind, that you would have to implement a counter or something so the filename changes every time. Like crop_1.jpg, crop_2.jpg, etc. If you just name it crop.jpg, it will only be the last object recognized, because it would keep replacing crop.jpg with every iteration.
Hi Does the first solution for extracting the coordinates delete the boxes drawn about the objects?
I'm new at the python language and TF tool so sorry for my dump question 🙋
@MousaAlnajjar I believe it does, I can't remember for sure. But I seem to remember needing to remove them because they were saved as part of the image, not entirely sure though.
If you do want the boxes, just look at the lines below:
# Draw all boxes onto image.
in the utils/visualization_utils.py
If you added my method into that file, you should have that "Draw all boxes onto image" line twice in the file. One for the coordination extraction and one for drawing the boxes.
Play around with that and see if you can get it to do both.
I'll try it, but I want to ask you another question i can't find the file "object_detection_dir" in the models to add the code above on it and i noticed that it wasn't used neither in the detection code and "utils/visualization_utils.py"
@psi43 There is a weird contradiction I came across when there is multi object detection ... I'm doing rock, paper, scissor detection ... so if I use vis_util.return_coordinates
to return classes .. it will return 2 different coordinates .. but it will print the same class (which is wrong) ..
but when it comes to using the drawing functionality in vis_util
.. it will draw the 2 boxes, but each label is different (which is true)
looks like the function doesn't take more than one class per frame
note: ignore the false detection as I've trained on few images
this is the detection:
and this is the classification:
[231, 404, 352, 616, 99.99584555625916, 'scissor']
[159, 424, 33, 216, 92.08916425704956, 'scissor']
representing y1, y2, x1, x2, accuracy, classification respectively
I just solved the issue ... looks like the label is not updated in the for
loop .. so if there are multiple labels in the same frame, it will return the latest one only ... I've edited the last few lines in vis_util.return_coordinates
function to be like this:
# Draw all boxes onto image.
coordinates_list = []
counter_for = 0
for box, color in box_to_color_map.items():
ymin, xmin, ymax, xmax = box
height, width, channels = image.shape
ymin = int(ymin*height)
ymax = int(ymax*height)
xmin = int(xmin*width)
xmax = int(xmax*width)
coordinates_list.append([ymin, ymax, xmin, xmax, (box_to_score_map[box]*100),display_strs[counter_for]])
counter_for = counter_for + 1
return coordinates_list