striprtf icon indicating copy to clipboard operation
striprtf copied to clipboard

NAs are not allowed in subscripted assignments

Open quzhouxiachuan opened this issue 5 years ago • 9 comments

Hi I am using strip_rtf() to convert rtf to plain text. I encountered the following errors: strip_rtf(x[8]) Error in out[table_flg] <- paste(row_start, out[table_flg], row_end, sep = "") :
I checked x[8] content, it is not NA.

Could you please help with that?

quzhouxiachuan avatar Mar 04 '19 17:03 quzhouxiachuan

@quzhouxiachuan I would love to fix it. Can you give me a reproducible example of error?

kota7 avatar Mar 04 '19 18:03 kota7

I'm having the same problem, though the debugger is more illuminating of the cause (not related to the input being NA, but rather an NA value in the logical vector used to index out

Unfortunately I'm having a really hard time getting a reprex that doesn't violate medical ethics ( trying to manually edit raw rtf was a lesson in patience but mainly a lesson in humility and pain ), rtf is the ugliest markup I've ever seen.

I will say, however, that the easy fix is probably just to convert those NA's in tbl_flg to FALSE and accept that it might slightly break some table formatting

strazto avatar Jan 09 '20 09:01 strazto

I'll put in a PR for the fix when I'm home

strazto avatar Jan 09 '20 09:01 strazto

@mstr3336 Thanks, I look forward to seeing that. Do you think you can share a test case too?

kota7 avatar Jan 09 '20 09:01 kota7

I'll try to share a test case, but unfortunately because I don't really understand the control characters/tokens well enough to narrow it down to a minimal (fully anonymised) valid rtf document, let alone produce my own, I can't make any promises

If I doesn't cause regression bugs that'll be nice though

Get Outlook for Androidhttps://aka.ms/ghei36


From: Kota Mori [email protected] Sent: Thursday, January 9, 2020 8:33:30 PM To: kota7/striprtf [email protected] Cc: Matthew Mark Strasiotto [email protected]; Mention [email protected] Subject: Re: [kota7/striprtf] NAs are not allowed in subscripted assignments (#15)

@mstr3336https://github.com/mstr3336 Thanks, I look forward to seeing that. Do you think you can share a test case?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/kota7/striprtf/issues/15?email_source=notifications&email_token=AJMZGQTKNKYN7AKMHPDVI33Q43VOVA5CNFSM4G3TSV72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIPT7YI#issuecomment-572473313, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJMZGQXFK5JVFGZJZPKMZ6LQ43VOVANCNFSM4G3TSV7Q.

strazto avatar Jan 09 '20 12:01 strazto

@mstr3336 I understand that. No problem.

kota7 avatar Jan 09 '20 14:01 kota7

I was able to identify the character that caused the NA-

\'e6 2 NQ\'d9\'81\'84\'c8m4\'cd\'cd1!p\'82 

gives

[46] " 2 NQ"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
[47] NA                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
[48] "m4"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
[49] "췍"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
[50] "1!p"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
[51] "\u0082"

Suggesting that the

\'d9\'81\'84

section is what's giving us grief.

in the "parsed" matrix, we see:

parsed$intcode
[[47]]
[1] 55681 33992

If I perform the following:

naughty_pair <- c(55681, 33992)

naughty_char <- intToUtf8(naughty_pair)

I get the following:

naughty_char
[1] NA

This takes place here:

https://github.com/kota7/striprtf/blob/649c245a2862463cce84c999b6b5f90becc5a827/R/striprtf.R#L130

This is where (at least my) NA was introduced.

The question is then -

Do I handle the NA (Eg, convert to "") here?

Or do I handle the NA around in the following block?

https://github.com/kota7/striprtf/blob/649c245a2862463cce84c999b6b5f90becc5a827/R/striprtf.R#L172-L178

My suspicion is that the following line is is intended to provide similar functionality:

https://github.com/kota7/striprtf/blob/649c245a2862463cce84c999b6b5f90becc5a827/R/striprtf.R#L173-L174

However, it's probably best to handle the NA's as soon as they are introduced, so they don't propagate into the logicals, as line 173 also introduces NAs into emp_tbl

strazto avatar Jan 10 '20 02:01 strazto

Thanks for considering my PR @kota7 .

It's now working for the problem document

strazto avatar Jan 10 '20 03:01 strazto

@mstr3336 Thanks for your analysis. If possible, can you share an RTF file that contains that problematic string?

kota7 avatar Jan 10 '20 05:01 kota7