colly
colly copied to clipboard
Ignore certain MIME types in fixCharset is not enough
Ignore certain MIME types in fixCharset is not enough, now only ignore video, image etc, but there are have some mime type should be ignore, for example, doc, pdf, xlsx ... application/msword, application/vnd.openxmlformats-officedocument.wordprocessingml.document,
in func fixCharset () ... if strings.Contains(contentType, "image/") || strings.Contains(contentType, "video/") || strings.Contains(contentType, "audio/") || strings.Contains(contentType, "font/") { // These MIME types should not have textual data.
return nil
}
should be fixed
if !strings.Constins(contentType, "text/") { return nil }