Results 538 comments of Amit Dovev

> For PDF generation, the problem is `snprintf` in `generateContentStringPdf` in `l_generatePdf` in `pixConvertToPdfData` and `cidConvertToPdfData`. https://github.com/DanBloomberg/leptonica/blob/a49f60aa26a1a5f9fbdaa2920bf6cf81e0fbd7b6/src/pdfio2.c#L1919-L1920

[`setlocale()` is defined in the C standard](https://en.cppreference.com/w/c/locale/setlocale) * [Linux](https://man7.org/linux/man-pages/man3/setlocale.3.html) * [FreeBSD](https://www.freebsd.org/cgi/man.cgi?query=setlocale) * [Cygwin](https://cygwin.com/cygwin-ug-net/setup-locale.html) * [macOS](https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man3/setlocale.3.html) * [Windows](https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale?view=msvc-170) * [Mingw-w64](https://github.com/Alexpux/mingw-w64/blob/d0d7f784833bbb0b2d279310ddc6afb52fe47a46/mingw-w64-headers/crt/locale.h#L80)

If there is a test program for the pdf writing functionality, it won't detect the issue unless it will try to validate the created pdf file, like @bertsky did with...

https://stackoverflow.com/questions/4057319/is-setlocale-thread-safe-function

Here's a comment about the license(s) from @terrelln https://news.ycombinator.com/item?id=16719001 He is one of the main developers: https://github.com/facebook/zstd/graphs/contributors

>Can someone point me to some high-level references on this issue? SEI CERT C Coding Standard ENV33-C. Do not call system() https://wiki.sei.cmu.edu/confluence/pages/viewpage.action?pageId=87152177

Great summary of the security work done for Leptonica, Dan.

Another reference from SEI CERT C Coding Standard related to this thread: Input Output (FIO) FIO02-C. Canonicalize path names originating from tainted sources https://wiki.sei.cmu.edu/confluence/display/c/FIO02-C.+Canonicalize+path+names+originating+from+tainted+sources Dan, you already changed the code...

>Are there any plans to add it? The best/fast models were uploaded 5 years ago. AFAIK, no one is working on updating them.

The word lists and trained text were generates by using a web crawler. Some filtering was done as a post processing step. So the undesirable effects you mentioned are to...