remarks Draft: V6 with rmscene

trafficstars

This pull request should not be merged.

I post this here as a pull request to use Github's diff interface.

I think it's best to merge this step-by-step. My proposal is to create a branch along with a respective issue for each (top-level) point below. Some points are good to merge immediately, so they will take only minutes, others will benefit strongly from further discussion, cleanup and testing so they might take a bit longer.

[x] Depend on rmscene
- [x] Ideally, we'd depend on it via PyPi and not locally. RMScene looks to be available on PyPi https://pypi.org/project/rmscene/
[x] Poetry.toml should be merged by hand. I believe the only change I made was to add rmscene as a dependency, my merge uses more outdated packages compared to upstream master (lucasrla/remarks)
[x] The commit in remarks/conversion/drawing.py should definitely be turned into a pull request. Sometimes a line has only one point, whereas a segment for PyMuPDF requires at least two points. I currently ignore it, but it might be a better alternative to render a point?
- [x] Exact same scenario occurs in get_ann_max_bound in parsing.py
[ ] Parsing and rendering .rm v6 documents in parsing.py, remarks.py and utils.py. This is very much a work in progress. It renders lines correctly in a couple of scenarios, and close to correctly in some others. Very poorly in others though.
[x] Tests are probably not a good idea to directly copy, because they tend to contain private or copyrighted material.
[x] There are a couple of tiny refactors included in the code such as the changes to the process_ocr function in remarks.py, it had an unused parameter. I cleaned this up without affecting any behavior whatsoever.
[x] I added an @cache annotation to read_meta_file in utils.py. This reduces filesystem reads for the .metadata file to 1. If you use remarks in bulk this does save some time. I like how Python allows you to optimize unoptimally written code this simply :)

If you agree with the approach @lucasrla, I think it's best if I make the issues and branches myself since I know what code belongs together.

Mar 06 '23 20:03 Azeirah

Thanks, Laura!

Let me divide your commits into four buckets:

Immediate merges

[x] If you submit "Add: Reduce number of file reads" (https://github.com/lucasrla/remarks/pull/59/commits/258337804d2a35af29ece76733f59075832d131d) as a PR, I'll merge it right away

More info and testing needed

[x] Are you sure "Refactor: Remove unnecessary arg from ocr" (https://github.com/lucasrla/remarks/pull/59/commits/bd94728bdaaeda56f19a537771a5c22f633e061f) doesn't break anything? I think the reference had to be updated (but my memory could be wrong):

https://github.com/lucasrla/remarks/blob/12e1bde4265cc5254a14e814f9a4af4269e6ea3d/remarks/remarks.py#L472-L473

Not necessary anymore

[x] "Fix error where NoneType has no len()" (https://github.com/lucasrla/remarks/pull/59/commits/61d5fa191ebbc36fd431491f88f2a30543cd3f31) has already been merged upstream via https://github.com/lucasrla/remarks/commit/f3550d2985455ea807db284244be2696e7afaebe

Work towards supporting v6 .rm files (reMarkable >= 3.0)

[x] I'll first update one of my devices to v3.2. Then I'll have a look at your commits and start a separate branch to receive PRs related to v6 .rm files.

Sounds good?

Thanks again!

Mar 08 '23 14:03 lucasrla

Voilà: https://github.com/lucasrla/remarks/tree/dev-lines-v6

Mar 08 '23 17:03 lucasrla

If you submit "Add: Reduce number of file reads" (https://github.com/lucasrla/remarks/commit/258337804d2a35af29ece76733f59075832d131d) as a PR, I'll merge it right away

I made a PR for the cache annotation.

Are you sure "Refactor: Remove unnecessary arg from ocr" (https://github.com/lucasrla/remarks/commit/bd94728bdaaeda56f19a537771a5c22f633e061f) doesn't break anything? I think the reference had to be updated (but my memory could be wrong):

I'm not 100% sure, no. I relied on PyCharm's inspections.

afbeelding

Although I do think it's correct in this case. ann_page gets passed to process_ocr. It is not used in the body. Then, ann_page is redefined to be work_doc[0] and lastly gets returned. In the caller-site, the original ann_page reference is overwritten by this line:

work_doc, ann_page = process_ocr(work_doc, ann_page)

There are no other callers.

Mar 08 '23 18:03 Azeirah

Hi all, really cool and interesting work. I just bought a RM2 last week and I am eager to use it without any cloud subscription. Hence, I would like to:

Automatically backup / sync the notes on the RM2 with my home server
Automatically convert the notes and PDF annotations and import the resulting PDFs into my Obsidian vault

For (1) I was planning to use rsync but have not tried yet to install it on the RM2. I am on version 3.5.1.1798. Anyone tried yet to install rsync on this version and can give me some directions (I know this has nothing to do with this project here, but maybe you guys can point me into the right direction)? So far I was simply using scp to copy the notes from the device.

For (2) I successfully used rmscene and more specifically rmc to convert the notes to SVGs and PDFs. For the annotated PDFs I was planning to use remarks. Hence, I just gave Lauras branch a go and tried it on a PDF where I inserted a note page and added some notes. This led to a crash since there is no longer a redirectionPageMap in content and my additional note page messed up the page list (https://github.com/Azeirah/remarks/blob/603ca190c4cff7e8a80b168667d619cc255c265d/remarks/utils.py#L66).

I made a quick and dirty fix on my local copy that I can investigate further and maybe provide a fix for that. However, the output PDF was created successfully after my fix but the rendering was not perfect (pages after my note page were much smaller than the rest).

Since I am really liking the RM2 but do not want to use the cloud service, I can maybe help to get remarks working on the v6 RM2 files. But how shall we proceed? I can maybe dive deeper into the issues of my current PDF:

highlights were not rendered on the text
the aforementioned problem with the added note page

And provide that in a separate branch that we can merge into Lauras? What do you think?

Jul 18 '23 19:07 wittmeis

For (1) I was planning to use rsync but have not tried yet to install it on the RM2. I am on version 3.5.1.1798. Anyone tried yet to install rsync on this version and can give me some directions (I know this has nothing to do with this project here, but maybe you guys can point me into the right direction)? So far I was simply using scp to copy the notes from the device.

You should check out https://github.com/toltec-dev/toltec

I made a quick and dirty fix on my local copy that I can investigate further and maybe provide a fix for that. However, the output PDF was created successfully after my fix but the rendering was not perfect (pages after my note page were much smaller than the rest).

Can you post the branch? I wasn't aware of an issue with inserted pages, makes sense though. I'm on holiday the following couple days so it'll be next week until I can look at it though.

pages after my note page were much smaller than the rest

It's not exactly a bug but not a feature either, I had a lot of trouble getting strokes and highlights to display at the correct coordinates and through scaling it was the only way I could get it to work for now. I think scaling the note page back down again should not be too difficult though. I'm targeting a very different resolution for rendering whereas the original code for pre v6 targeted something close to A4. Mine is closer to A3.

highlights were not rendered on the text

Does your notebook folder contain .rm-highlights files? I looked into missing highlights and misplaced highlights just the last week and noticed that in many of my documents sometimes the .rm-highlights files were missing. On my documents highlights are rendering as expected.

Also, if you can share your pdf/notebook as a whole (the ReMarkable filesystem files, not the .pdf file itself) that would be helpful for debugging.

Jul 18 '23 21:07 Azeirah

Hi Laura, thank you for your prompt response.

I just created a pull request for my changes. However, bare in mind please that this was just a quick hack yesterday. I have not tried yet whether the code still works for PDFs without note pages.

As for toltec. I stumbled on this before but since it states that it is only working for the 2.x versions of the RM I did not dare to try it as I am afraid to brick the device. Are you using it with the 3.x version? Is it still working. As I said, I only need rsync - maybe there is even the executable somewhere to download? How do you copy the notes and files from your RM2?

Jul 19 '23 06:07 wittmeis

Are you using it with the 3.x version?

Not sure, might have ran it on 3.x. I don't use any hacks on my device, it was just for tinkering a while ago.

As I said, I only need rsync - maybe there is even the executable somewhere to download?

I'm sure there's an executable somewhere. Best to ask on the remarkable discord, I think the discord is linked on the remarkabletablet community on reddit. There are a lot of tinkerers there.

How do you copy the notes and files from your RM2?

I just used a recursive scp command. It works fine although it's a bit slow. I don't think it has any fancy features like computing differences the way rsync does.

Still, I used it to create a 6GB+ backup of all my files and I had to restore from backup later that day. It worked perfectly.

Again, probably best to ask around on the discord server.

Jul 19 '23 07:07 Azeirah

I had some pretty gnarly tablet behavior (boot loops, freezing... though luckily all seemed to settle after a while) when I experimentally installed toltec on 3.x — would not recommend without major caution!

May be possible to get better scp performance by taring pre-transfer and untarring post-transfer, though I can't remember if the right libs are available on the "vanilla" install of the rM tablet...

Jul 19 '23 18:07 j6k4m8

I had some pretty gnarly tablet behavior (boot loops, freezing... though luckily all seemed to settle after a while) when I experimentally installed toltec on 3.x — would not recommend without major caution!

Thanks for that feedback. I really would like to settle on a solution that does not require to install additional packages on the RM. Maybe I simply stick to the scp approach for now and see how it performs. Since scp is overwriting the target files even if they are existing, I assume I would need to do some additional checks to find out which files need to be re-processed by remarks and rmc. But that is feasible. I guess I can simply copy the metadata files or some other sidecar file to the processed files to be able to check for modification dates.

But this is all off-topic for that PR here, so sorry for hi-jacking it. ;-)

Jul 19 '23 18:07 wittmeis

I just used a recursive scp command. It works fine although it's a bit slow. I don't think it has any fancy features like computing differences the way rsync does.

Hi, I just browsed through the file system on my RM2 and found that there is already a rsync executable in /usr/bin. Just tested it and it works like charm - in case you want to speed up your scp copy.

Jul 24 '23 19:07 wittmeis

I just used a recursive scp command. It works fine although it's a bit slow. I don't think it has any fancy features like computing differences the way rsync does.

Hi, I just browsed through the file system on my RM2 and found that there is already a rsync executable in /usr/bin. Just tested it and it works like charm - in case you want to speed up your scp copy.

Oh haha, good to know

Jul 25 '23 06:07 Azeirah

Hey Laura, I noticed that the conversion of single page notebooks works really good already but for multi-page notebooks only the 1st page is rendered correctly. The 2nd page then already has a weird dimension. I debugged the code and there are negative y coordinates for the pen positions of the 2nd page and hence the page dimensions in determine_document_dimensions seem to be messed up.

Have you encountered this already yourself? Do you have any idea why the pen positions of the 2nd page are different from the ones on the 1st page?

Jul 30 '23 08:07 wittmeis

Hey Laura, I noticed that the conversion of single page notebooks works really good already but for multi-page notebooks only the 1st page is rendered correctly. The 2nd page then already has a weird dimension. I debugged the code and there are negative y coordinates for the pen positions of the 2nd page and hence the page dimensions in determine_document_dimensions seem to be messed up.

Have you encountered this already yourself? Do you have any idea why the pen positions of the 2nd page are different from the ones on the 1st page?

Are you using the text tool/type folio on that page? Text changes coords of everything else on the page.

Otherwise do you have the notebook so I can reproduce the issue for myself?

I recently added a reliable way to do testing each individual feature so I feel more confident making changes now without breaking anything else.

Jul 30 '23 10:07 Azeirah

Hi Laura, thanks for your prompt reply.

Frankly, I am not using the type folio but it could be that I once hit the type button in the menu. Maybe this is enough to change the coordinate system?

I will check myself first with a simple and clean test notebook. In case it does not work, I can also provide the test as a PR.

Jul 30 '23 15:07 wittmeis

Sorry but I have another question and I do not know where to post this....

Have you considered to use the USB web API for converting annotated PDFs and notebooks to PDFs?

I have found this library here but have not tried it yet. The advantage would be that RM2 format changes are not important and I would assume that the USB web API is more stable. Moreover, in case of the notebooks the used template would be also part of the PDF. Of course it requires the device to be turned on to do the conversion on the device.

For my personal use-case a possible setup could be:

rsync for creating a backup of the device
a Python tooling that checks for updated files and triggers the re-conversion of these files using the USB API

Jul 31 '23 19:07 wittmeis

Sorry but I have another question and I do not know where to post this....

Have you considered to use the USB web API for converting annotated PDFs and notebooks to PDFs?

I have found this library here but have not tried it yet. The advantage would be that RM2 format changes are not important and I would assume that the USB web API is more stable. Moreover, in case of the notebooks the used template would be also part of the PDF. Of course it requires the device to be turned on to do the conversion on the device.

For my personal use-case a possible setup could be:
* rsync for creating a backup of the device

* a Python tooling that checks for updated files and triggers the re-conversion of these files using the USB API

If it fits your use-case better, then I suppose the web interface is the way to go. It's not the use-case I'm looking for though. I need something that works with the API.

Jul 31 '23 20:07 Azeirah

Hey @Azeirah, awesome work! I really apreciate that! How is the state of this branch? Could I use it without fear?

Oct 15 '23 17:10 torbenkeller

Hey @Azeirah, awesome work! I really apreciate that! How is the state of this branch? Could I use it without fear?

It mostly works pretty well, these two are the largest limitations:

Does not output OCRed text very well or in some cases at all (working on this)
Annotated pages are a lot larger than PDF pages so the file looks inconsistent

There might also still be undiscovered bugs.

Overall it's pretty stable, over 80 users of https://scrybble.ink are using this branch to sync their documents to Obsidian.md

Oct 15 '23 17:10 Azeirah

remarks remarks copied to clipboard

Draft: V6 with rmscene

remarks
remarks copied to clipboard