striprtf
striprtf copied to clipboard
NAs are not allowed in subscripted assignments
Hi
I am using strip_rtf() to convert rtf to plain text. I encountered the following errors: strip_rtf(x[8]) Error in out[table_flg] <- paste(row_start, out[table_flg], row_end, sep = "") :
I checked x[8] content, it is not NA.
Could you please help with that?
@quzhouxiachuan I would love to fix it. Can you give me a reproducible example of error?
I'm having the same problem, though the debugger is more illuminating of the cause (not related to the input being NA, but rather an NA value in the logical vector used to index out
Unfortunately I'm having a really hard time getting a reprex that doesn't violate medical ethics ( trying to manually edit raw rtf was a lesson in patience but mainly a lesson in humility and pain ), rtf is the ugliest markup I've ever seen.
I will say, however, that the easy fix is probably just to convert those NA
's in tbl_flg
to FALSE
and accept that it might slightly break some table formatting
I'll put in a PR for the fix when I'm home
@mstr3336 Thanks, I look forward to seeing that. Do you think you can share a test case too?
I'll try to share a test case, but unfortunately because I don't really understand the control characters/tokens well enough to narrow it down to a minimal (fully anonymised) valid rtf document, let alone produce my own, I can't make any promises
If I doesn't cause regression bugs that'll be nice though
Get Outlook for Androidhttps://aka.ms/ghei36
From: Kota Mori [email protected] Sent: Thursday, January 9, 2020 8:33:30 PM To: kota7/striprtf [email protected] Cc: Matthew Mark Strasiotto [email protected]; Mention [email protected] Subject: Re: [kota7/striprtf] NAs are not allowed in subscripted assignments (#15)
@mstr3336https://github.com/mstr3336 Thanks, I look forward to seeing that. Do you think you can share a test case?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/kota7/striprtf/issues/15?email_source=notifications&email_token=AJMZGQTKNKYN7AKMHPDVI33Q43VOVA5CNFSM4G3TSV72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIPT7YI#issuecomment-572473313, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJMZGQXFK5JVFGZJZPKMZ6LQ43VOVANCNFSM4G3TSV7Q.
@mstr3336 I understand that. No problem.
I was able to identify the character that caused the NA-
\'e6 2 NQ\'d9\'81\'84\'c8m4\'cd\'cd1!p\'82
gives
[46] " 2 NQ"
[47] NA
[48] "m4"
[49] "췍"
[50] "1!p"
[51] "\u0082"
Suggesting that the
\'d9\'81\'84
section is what's giving us grief.
in the "parsed" matrix, we see:
parsed$intcode
[[47]]
[1] 55681 33992
If I perform the following:
naughty_pair <- c(55681, 33992)
naughty_char <- intToUtf8(naughty_pair)
I get the following:
naughty_char
[1] NA
This takes place here:
https://github.com/kota7/striprtf/blob/649c245a2862463cce84c999b6b5f90becc5a827/R/striprtf.R#L130
This is where (at least my) NA was introduced.
The question is then -
Do I handle the NA (Eg, convert to ""
) here?
Or do I handle the NA around in the following block?
https://github.com/kota7/striprtf/blob/649c245a2862463cce84c999b6b5f90becc5a827/R/striprtf.R#L172-L178
My suspicion is that the following line is is intended to provide similar functionality:
https://github.com/kota7/striprtf/blob/649c245a2862463cce84c999b6b5f90becc5a827/R/striprtf.R#L173-L174
However, it's probably best to handle the NA's as soon as they are introduced, so they don't propagate into the logicals, as line 173 also introduces NAs into emp_tbl
Thanks for considering my PR @kota7 .
It's now working for the problem document
@mstr3336 Thanks for your analysis. If possible, can you share an RTF file that contains that problematic string?