pdfannots
pdfannots copied to clipboard
Feature: CSV output
Would love CSV output like this:
page,type,author,created,text
1,Highlight,John,2023-05-17T11:38:17,Text
Sounds like that should be possible but not sure how. Great tool, thanks!
You can certainly write a printer to do that -- take a look at the Json output for an example: https://github.com/0xabu/pdfannots/blob/658984edebb6bb8409e9ce8bb49ac85ded8f8675/pdfannots/printer/json.py
If you don't want to do that, perhaps take json from pdfannots and convert it to csv: https://stackoverflow.com/questions/32960857/how-to-convert-arbitrary-simple-json-to-csv-using-jq
Thanks -- yeah took a quick look, seems possible. Might look at it indeed, thanks for the pointer.
[Beginner's level question]
I would like to ask if there is an option [or rather how to set it] to use encoding that contains Polish and German special signs. I want to implement your algorithm in learning German language. The problem is that the output .txt (json) file does not show any Polish or German special signs.
Correct version text: Lösung contents: rozwiązanie
I tried to modify the json file but I stuck. :/
Console line:
pdfannots "path" -f json > directories\json_to_csv.txt
Some additional information:
- The PDF file has been written in Goethe FF Clan font. When I copy the word from the file and paste p.e. to Notepad++/WordPad/browser, it copies the special signs, too.
- Currently I can create the .csv file from the .json output, but there are still no German or Polish signs
- The same situation takes place when I am trying to create a markdown (.md) file.
Best regards
@Proeliorr this has nothing to do with CSV. Why are you commenting on this issue?
In any case, pdfannots always outputs utf8, and indeed 00f6 is the unicode codepoint for ö (https://codepoints.net/U+00F6) -- I think perhaps you need to tell your text editor to use the utf8 encoding.
@0xabu After some consideration I agree.
The file was re-saved in utf8 encoding, notepad++ sees it as an utf8 encoded file. That is where the problem lies.
Nevertheless, I will not disturb the given below topic anymore. I think it is not a pdfannots case further. Cheers
@Proeliorr I took another look at this, there is something fishy going on with output redirection on Windows. I've created #84 to track it. Luckily it has a pretty simple workaround -- use -o
to write output to a file, rather than redirection.