FsPdf icon indicating copy to clipboard operation
FsPdf copied to clipboard

[WIP]First take on improved Word Wrapping

Open Angr1st opened this issue 5 years ago • 16 comments

This is my first try on implementing the suggested improvement to the Word Wrapping.

Angr1st avatar Apr 02 '19 18:04 Angr1st

@Angr1st thank you for the contribution! I'm trying to work through what's going on here, as the test for wrapped text doesn't seem to break on spaces even with this PR in place. This test uses the wrapString function on a paragraph of text so it's the best test to verify this works. It will create a PDF called "Wrapped Test PDF.pdf" in the tests/FsPdf.Tests/bin/Debug/netcoreapp2.2 directory.

I'm going to research a bit to try to figure out what's happening here.

ninjarobot avatar Apr 03 '19 03:04 ninjarobot

@Angr1st looking through this, it didn't seem to really keep track of the characters read since the last space, so it would continue to write as usual even though it now had some awareness of where the last space was. I ended up implementing it a bit differently using a second StringBuilder as a buffer within the seq builder. This seems to work. Any thoughts on improvements for it?

ninjarobot avatar Apr 03 '19 04:04 ninjarobot

One thing would be why the generated PDFs cannot be opened using Edge or Word. I dont have time today to look into it in depth but if your implementation is working then that is great. From what I saw skimming over it I personally think that when writing F# one should not use the if-then-else expression a lot but rather opt for match-expressions as they are just more beautiful and fp but that is just my personal opinion. So good work building this feature 👍

Angr1st avatar Apr 03 '19 17:04 Angr1st

I haven't tried opening them with Edge or Word, so I'm not sure. I'll have to give them a try and see what the errors are. Do any of the generated PDF's open?

ninjarobot avatar Apr 03 '19 17:04 ninjarobot

@Angr1st Edge doesn't work because the tests use PString instead of PName in two spots, have a PR for it: https://github.com/ninjarobot/FsPdf/pull/5

EBrown8534 avatar Apr 03 '19 19:04 EBrown8534

@EBrown8534 thanks for fixing that. Also regarding #6 In the paket tutorial didnt they recommend renaming it to paket.exe? @ninjarobot none of the PDFs generated by UnitTests work with Edge.

Angr1st avatar Apr 03 '19 20:04 Angr1st

@Angr1st I'd prefer to discuss #6 over there (only because when this gets closed or merged I don't want to have to check a dead PR) if you wouldn't mind. 😃

EBrown8534 avatar Apr 03 '19 21:04 EBrown8534

One thing would be why the generated PDFs cannot be opened using Edge or Word.

After some much appreciated help from @EBrown8534 and his PDF validation recommendations, an issue with the generated PDF content is fixed and everything works on Edge now. Various PDF readers are tolerant of different things. Thank you for bringing that up, and please let me know if you hit any other issues.

ninjarobot avatar Apr 03 '19 22:04 ninjarobot

I have revised my initial implementation it seems to work now but I feel it to be a little more elegant using a DU for discrimination between the initial state and the encounter of a space. I also switched to IsLetterOrDigit to check if it is a space or not. This could potentially become a minor issue as wrapping could occur on all punctuation marks. I had a look at the docs an there is a isWhitespace method that should suite our needs here better I think?

Angr1st avatar Apr 03 '19 22:04 Angr1st

I had a look at the docs an there is a isWhitespace method that should suite our needs here better I think?

@Angr1st I originally was thinking that, but then if we have practically any punctuation, we'd probably want to break after that and not necessarily wait until some whitespace. But I'm not married to the approach either.

ninjarobot avatar Apr 03 '19 23:04 ninjarobot

But wouldnt it now just wrap between punctuation and last Digit or letter as punctuation is neither letter nor Digit?

Dave Curylo [email protected] schrieb am Do., 4. Apr. 2019, 01:09:

I had a look at the docs an there is a isWhitespace method that should suite our needs here better I think?

@Angr1st https://github.com/Angr1st I originally was thinking that, but then if we have practically any punctuation, we'd probably want to break after that and not necessarily wait until some whitespace. But I'm not married to the approach either.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ninjarobot/FsPdf/pull/3#issuecomment-479690550, or mute the thread https://github.com/notifications/unsubscribe-auth/ARjQ_TPoVLWmStSgPYtU-Pmbk8_5hQN0ks5vdTSbgaJpZM4cY2S7 .

Angr1st avatar Apr 03 '19 23:04 Angr1st

@ninjarobot Breaking on punctuation gets tricky -- for example, in many web browsers periods, commas, etc. are not given line-breaks unless absolutely needed. Most of the time you notice the breakage on punctuations is due to there being a space afterwards.

But, if you have a long word that has no spaces around punctuation, the punctuation is never broken on excepting hyphens / dashes.

I.e.:

afgas.dfafh.asdg.asd.ga.sdga.sdg.asdgasdg.asdg.as.dga.sdg.asdgasd.gas.dg.asd.ga.sdg.asd.ga.sdg.asd.gasdgasdgasgsd

I've attached a screenshot of how this looks in the editor, but it's effectively broken on a letter, not on a period.

image

This shouldn't be a major issue, though, as most punctuation we would see is either a hyphen (almost always treated as a word-wrap breaking character) or periods / commas, which are followed by whitespace.

I only mention this, because I recently hit it in a project where I had to add a zero-width space around all punctuation to force a break in web-browsers, so this is definitely an aspect we want to consider.

EBrown8534 avatar Apr 03 '19 23:04 EBrown8534

Right now as it is, it will absolutely wrap when it hits the maximum width based on measuring the width of characters with standard character spacing from the font (haven't dealt with modified spacing either). But if there has been a character that is not alphanumeric when it hits the end of the line, it will break on that instead.

I can see the point about breaking after any punctuation, generally that's probably not what we want, and IsWhitespace would end up with the expected results in more cases.

ninjarobot avatar Apr 04 '19 00:04 ninjarobot

@Angr1st this still doesn't seem to wrap. The file "Wrapped Text PDF.pdf" still looks like this and the end of lines: image

ninjarobot avatar Apr 05 '19 01:04 ninjarobot

@Angr1st I adjusted the code I have now to use match expressions similar to this PR. I ended up really needing to build out the same level of nesting, so it didn't really improve readability in the code. I agree match expressions are more expressive when you can break into a nice set of cases, though.

ninjarobot avatar Apr 05 '19 03:04 ninjarobot

After the Peek() you can drop the Guard and directly match -1. Yes you are correct the identation level wouldn't change much but I really like the more explicit cases that you can see when using match. To keep the indentation Level Low i mostly Break it up into small funcs. If they are aptly named then everything is nice otherwise you can have function overload when there are Just too many tiny functions.

I renamed the PR to start with WIP to show that it is still broken.

Dave Curylo [email protected] schrieb am Fr., 5. Apr. 2019, 05:55:

@Angr1st https://github.com/Angr1st I adjusted the code I have now to use match expressions similar to this PR https://github.com/ninjarobot/FsPdf/commit/447e1793eee0b541c8b30102284aab00f811e50b. I ended up really needing to build out the same level of nesting, so it didn't really improve readability in the code. I agree match expressions are more expressive when you can break into a nice set of cases, though.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ninjarobot/FsPdf/pull/3#issuecomment-480140219, or mute the thread https://github.com/notifications/unsubscribe-auth/ARjQ_da87fA4zTvFaMxhz8Lx9oiV8iMFks5vdskogaJpZM4cY2S7 .

Angr1st avatar Apr 05 '19 15:04 Angr1st