Report invalid elements or attributes
The Cleaner class today uses the protected methods isSafeTag and isSafeAttribute to determine whether a tag or attribute is valid according to the provided whitelist. Those methods are not directly available to the application using Jsoup. To provide better feedback to whoever provided a document, it would be interesting to determine easily which tags and attributes invalidate a specific document. Maybe, in addition to keeping count of removed elements and attributes, the cleaner could also keep and provide a list of those things if requested.
I like this idea, and could be similar to the error tracking option in the HTML parser.
Hi, we are a student group and we would like to take a crack at this. Can't guarantee that we'll be able to complete it with high enough quality but we'll like to try.