Usage documentation
Hi! We're trying to find tools to check correctness of a parser. I've been trying to run the poppler example, but is it normal the tdag is over 21 GB? If this repo is still supported, I had some additional questions on best ways to use polytracker.
Thanks!
Hi @lwli11, the TDAG is currently of a fixed size because of the way we allocate it. Feel free to ask any questions you like :)
So using one of the examples, after building poppler-demo (and I think the filepath to the dockerfile changed), then were the correct steps:
- docker run -it
/bin/bash - Copy in a small pdf (I used the one here: https://blog.idrsolutions.com/make-your-own-pdf-file-part-4-hello-world-pdf/)
- ./pdftotext.instrumented hello.pdf, which generated polytracker.tdag)
- Then "polytracker mapping polytracker.tdag" gave me a file like: "defaultdict(<class 'set'>, {(PosixPath('hello.pdf'), 372): {(PosixPath('hello.txt'), 3), (PosixPath('hello.txt'), 6), (PosixPath('hello.txt'), 9), (PosixPath('hello.txt'), 2),"
- "polytracker forest polytracker.tdag output.dot" created a .dot file which seems too big to convert? I'm surprised since the hello.pdf file is quite small.
- And "polytracker cavities polytracker.tdag" didn't give me anything so I assume poppler had no cavities.
Does this seem correct? It seems I didn't need the section "Running and Analyzing an Instrumented Program" and run_trace?
Thanks!
We'll get back to you about building the demo, but one note about the TDAG file: It is incredibly sparse, so if you store it on a sparse filesystem then it will only consume a tiny fraction of the 21 GB on disk. We have a script somewhere to compress the TDAG files (e.g., for transfer), we'll try and find that and point you to it. However, the TDAG format has changed a bit over the years, so that script may have bitrotted.