RTR Text reflow
Problem: Recognized text isn't really formatted into paragraphs in a way that is nice for markdown.
Solution: There needs to be some sort of heuristics algorithm to group the bounding boxes (see IRecognitionElement) and text into reflowed text.
How you can help: Please upload simple single page RTR .note files to this issue and include a copy of the text formatted in the way you wish it was formatted so I can generate some test cases.
For example:
Note: rtr.note.zip
Screenshot (optional):
Should output this:
Real time recognition paragraph test
With enough space a new paragraph should be created. If lines are close then the text should reflow.
This should be a new paragraph.
As well as this.
But thin is the last paragraph and should reflow together.
Sorry for the delay, thank you for working on this. Here are a few notes that are already published for you to take a look at. Let me know if I can help.
I have had a couple of false starts at an algorithm here. The observations so far:
- The label field at the top has all of the text in the right order but no bounding boxes.
- The words field has all of the bounding boxes but the words may be out of order and are hard to reliably sort.
Probably need to make a third attempt here where I trust the label field and then try to build a bounding box around the lines using fuzzy matching from the words fields for each line.
Thank you for your work on this. The new PDF attach is awesome, been working through current notes.
Sent from Samsung Galaxy smartphone. Get Outlook for Androidhttps://aka.ms/AAb9ysg
From: Brandon Philips @.> Sent: Friday, February 7, 2025 9:12:17 PM To: philips/supernote-obsidian-plugin @.> Cc: edfinn1973 @.>; Comment @.> Subject: Re: [philips/supernote-obsidian-plugin] RTR Text reflow (Issue #49)
I have had a couple of false starts at an algorithm here. The observations so far:
- The label field at the top has all of the text in the right order but no bounding boxes.
- The words field has all of the bounding boxes but the words may be out of order and are hard to reliably sort.
Probably need to make a third attempt here where I trust the label field and then try to build a bounding box around the lines using fuzzy matching from the words fields for each line.
— Reply to this email directly, view it on GitHubhttps://github.com/philips/supernote-obsidian-plugin/issues/49#issuecomment-2644458024, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BHKDJFYNO7HYOD5UO5WECG32OVYZDAVCNFSM6AAAAABV3QJLPCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNBUGQ2TQMBSGQ. You are receiving this because you commented.Message ID: @.***>
I have the similar issue in PySN, perhaps best illustrated by a user's example: https://gitlab.com/mmujynya/pysn-digest/-/issues/17
Intuitively, I think the proper algorithm should include corrections on the individual boxes when they contain letters such as "f", "g", "p", "q" etc. Such letters "expand " the boxes downwards. Letters such as"d", "t" etc expand them upwards.
When I convert text to .note format, I have to take into account these vertical adjustments
If you find a solution,I'll gladly copy it.
One other update. As I've been working with my notes lately, some of the rtr notes, all I use anymore, are not converting. Any thoughts? It's very random. I appreciate all the help.
That's likely a bug that's been around for over a year, on the Supernote. The SN keeps updating the text in the background. But instead of replacing previously recognized text by newly recognized text once the newly recognized process is complete, it erases first whatever was previously recognized. If for some reason the new recognition process is stopped or if you export the file during this process, you end up having holes in your text. Ratta is aware of this.
Max,
Thanks for the update, explains a bit.
Ed
Sent from Samsung Galaxy smartphone. Get Outlook for Androidhttps://aka.ms/AAb9ysg
From: Max Mujynya @.> Sent: Wednesday, February 12, 2025 11:48:09 PM To: philips/supernote-obsidian-plugin @.> Cc: edfinn1973 @.>; Comment @.> Subject: Re: [philips/supernote-obsidian-plugin] RTR Text reflow (Issue #49)
That's likely a bug that's been around for over a year, on the Supernote. The SN keeps updating the text in the background. But instead of replacing previously recognized text by newly recognized text once the newly recognized process is complete, it erases first whatever was previously recognized. If for some reason the new recognition process is stopped or if you export the file during this process, you end up having holes in your text. Ratta is aware of this.
— Reply to this email directly, view it on GitHubhttps://github.com/philips/supernote-obsidian-plugin/issues/49#issuecomment-2655550075, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BHKDJF4NV4RF7F23ZJ7AQPT2PQWZTAVCNFSM6AAAAABV3QJLPCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNJVGU2TAMBXGU. You are receiving this because you commented.Message ID: @.***>