striprtf icon indicating copy to clipboard operation
striprtf copied to clipboard

Stripping rtf to plain old text

Results 8 striprtf issues
Sort by recently updated
recently updated
newest added

I'm having issues while decoding some characters. I get the following errors when decoding some rtf text: "'charmap' codec can't encode character '\\x96' in position 0: character maps to ",...

I have been using the striprtf libraty and it has worked great! But, for some of the texts I that I am decoding I get the following error: **'charmap' codec...

Adding part of the content here. ffffffffffffffffffffffffffffffff52006f006f007400200045006e00740072007900000000000000000000000000000000000000000000000000000000000000000000000000000000000000000016000500ffffffffffffffffffffffff0c6ad98892f1d411a65f0040963251e50000000000000000000000007034 108e75dcd701feffffff00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000ffffffffffffffffffffffff00000000000000000000000000000000000000000000000000000000 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000ffffffffffffffffffffffff0000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000ffffffffffffffffffffffff000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000105000000000000

Hello Joshy, I was using this library for parsing rtf file, somehow its failing to load the content. \\par \\pard\\li180\\sb50\\f1 wow at -8,99 \\'85 so it can be profitable\\f0 \n

striprtf 0.0.26 {\rtf1\ansi\ansicpg1251 {\rtf1\adeflang1025\ansi\ansicpg1251 rtf_to_text() converting RTFs cp1251 is well (Russian text). {\rtf1\adeflang1025\ansi\ansicpg1252 But not cp1252: абвгдеёжзийклмнопрст -> àáâãäå¸æçèéêëìíîïðñò encoding=... do not help. This helps: https://ru.stackoverflow.com/questions/1145225/Ошибка-обработки-файлов-rtf-на-python?ysclid=lqagyqz7x5798462943 or rtf_to_text(rtf.read()).encode('cp1252').decode('ansi') [test-rus.zip](https://github.com/joshy/striprtf/files/13700580/test-rus.zip)

@joshy is it possible to extract the page numbers or the paragraph numbers from the given rtf file?

This is the rtf I have. ``` {\rtf1\fbidis\ansi\ansicpg1252\deff0\nouicompat\deflang1033{\fonttbl{\f0\fnil\fcharset0 Microsoft Sans Serif;}{\f1\fswiss\fcharset134 Microsoft YaHei;}{\f2\fnil Microsoft Sans Serif;}} {\*\generator Riched20 10.0.19041}\viewkind4\uc1 \pard\ltrpar\f0\fs17 [HEADER]\par [/HEADER]\par [BODY]\par [1]\par 00:00:44:22\par 00:00:48:05\par \f1\'b8\'f9\'be\'dd\'d5\'e6\'ca\'b5\'b9\'ca\'ca\'c2\'b4\'b4\'d7\'f7\f0\par ``` fonttbl is...