android-yolo-v2
android-yolo-v2 copied to clipboard
About misjudgment measures
@szaza
Hi, The results learned with darknet are very good results (loss = 0.15, avg loss = 0.13, precison = 0.87, recall = 0.97, F1 - score = 0.92, mAP = 0.96), but converting the weights file to a pb file afterwards, when this pb file is loaded on andoroid-yolo-v2 and detection of an object is done, it reacts too quickly and classifies the photographed object instantly, so it seems to be erroneously judging .
Can I change the code so that object detection starts after the same object appears for 1 second or more?
Or, I would like to make a change so that the same detection result will be the detection result for the first time three consecutive times, where should I fix the code?
Thnak you in advance,
keides2
Hi, "Can I change the code so that object detection starts after the same object appears for 1 second or more?" - yes, you can modify the code accordingly to this proposal, however it sounds like a hack. I think this is not the proper way to solve this issue. I don't think either that the darknet implementation contains such a mechanism.
"Or, I would like to make a change so that the same detection result will be the detection result for the first time three consecutive times, where should I fix the code?" - this solution also seems like a hack for me, only difference is that maybe it is easier to implement.
Please have a look at the YOLOClassifier.java and put breakpoints into the classifyImage() method and try to investigate the results what you get. Probably it is easier to debug the YOLOClassifier.java by using this implementation: https://github.com/szaza/tensorflow-example-java; This application also uses the YOLOClassifier, however you can specify the input images one by one. Best wishes for your work and please notify me about your results!
Best regards, Zoltan
@szaza
Thank you for your reply. It seems that it will take time, but I'd like to investigate and try the modification.
Keides2
@szaza
With the gradle of Android Studio, I finally got an environment where I can debug.
At the breakpoint, when the debugger exits the for loop, title of priorityQueue becomes cow and bird.
However, I can not understand this result, so I do not know how to change it.
Hi, The priority queue contains those detected objects which confidence were higher than the specified confidence limit. As I can see you have four elements in the priority queue and the detected objects are cow and bird. For each element you can see a confidence value which shows you what is the chance for the detected object being from a given class. For example, for the first element the detector says that the chance of being cow is 88%. The location attribute gives you information about the location of the bounding box inside the image.
Hi @szaza,
Long-time no see. This year I will change this code again.
Since the object detection judgment is quick, the following improvements have been made.
[specification] If the same object is detected four times in a row, it is determined that a new object has been detected, the name of the object is displayed in the display field, and a description of the object is spoken.
The modified code is as follows:
public class ClassifierActivity extends TextToSpeechActivity implements OnImageAvailableListener {
...
// keides2 added
private String lastRecognizedClass = "";
private String nowRecognizedClass = "";
private String tts = "";
private String msgInf1 = "";
private String msgInf2 = "";
private int matchCount = 0;
private String lastResult = "";
private String nowResult = "";
@Override
public void onPreviewSizeChosen(final Size size, final int rotation) { ... }
@Override
public void onImageAvailable(final ImageReader reader) {
Image image = null;
try {
...
} catch (final Exception ex) {
...
}
runInBackground(this::run);
}
private String makeTts(String nowRecognizedClass) {
switch (nowRecognizedClass) {
case "C型リモコン": // Object-0
msgInf1 = "C型リモコンのケースは、13番、"; // Speak message 0-1
msgInf2 = "プリント基板は19番です。"; // Speak message 0-2
break;
case "軍手・革手": // Object-1
msgInf1 = "軍手、耐カット軍手、皮手は、"; // Speak message 1-1
msgInf2 = "7番です。"; // Speak message 1-1
break;
...
default:
msgInf1 = "";
msgInf2 = "";
break;
}
return msgInf1 + msgInf2;
}
private void fillCroppedBitmap(final Image image) { ... }
@Override
protected void onDestroy() { ... }
private void renderAdditionalInformation(final Canvas canvas) {
...
// keides2 added
String msgInf = makeTts(nowRecognizedClass);
lines.add(msgInf1);
lines.add(msgInf2);
borderedText.drawLines(canvas, 10, 10, lines);
}
private void run() {
final long startTime = SystemClock.uptimeMillis();
final List<Recognition> results = recognizer.recognizeImage(croppedBitmap);
lastProcessingTimeMs = SystemClock.uptimeMillis() - startTime;
overlayView.setResults(results);
// keides2
Log.d(LOGGING_TAG, "Debug: runInBackground()");
if (!(results.isEmpty())) {
nowResult = results.get(0).getTitle();
Log.d(LOGGING_TAG, String.format("Find: %s", nowResult));
if (!(results.isEmpty()) && lastResult.equals(nowResult)) {
matchCount++;
if (matchCount > 3) {
// match 4 times
matchCount = 0;
if (!(results.isEmpty() || lastRecognizedClass.equals(nowResult))) {
nowRecognizedClass = nowResult;
Log.d(LOGGING_TAG, "Match 4 times: " + nowRecognizedClass);
lastRecognizedClass = nowRecognizedClass;
tts = makeTts(nowRecognizedClass);
Log.d(LOGGING_TAG, "makeTts(): " + tts);
if (!(tts.equals(""))) {
speak2(results, tts);
tts = "";
}
}
} else {
// nothing to do
}
} else {
matchCount = 0;
}
lastResult = nowResult;
Log.d(LOGGING_TAG, String.format("matchCount: %d", matchCount));
}
requestRender();
computing = false;
}
}
Thank you, keides2
Hi, I'm very glad that you improved the existing code. Could you please create a new branch with the changes and create a pull request? Thanks very much for your suggestions!