t3ext-extractor icon indicating copy to clipboard operation
t3ext-extractor copied to clipboard

Remove control characters

Open seirerman opened this issue 4 years ago • 0 comments

I have some files that have control characters (eg. BEL, VT, NAK, DC3...) in the metadata (see attachment: 70-2021-290-5-23-12-2021.pdf). See also https://en.wikipedia.org/wiki/ASCII#Control_code_chart for a list of control characters.

The extractor extension doesn't remove those special characters when a new file is imported into TYPO3. This leads to solr getting stuck while indexing the affected files, which stops all following files in the queue from indexing.

seirerman avatar Dec 07 '21 10:12 seirerman