Auto-CORPus
Auto-CORPus copied to clipboard
Auto format detection
Is your feature request related to a problem? Please describe
I would like to put all the documents in one folder, and AC can automatically process them with the right parsing method. We can use the extension file, but sometimes, XML are actually HTML.
Describe the solution you'd like
I mentioned it above.
Describe alternatives you've considered
no
Additional context
no
@Thomas-Rowlands This is the library we're already using: https://pypi.org/project/filetype/
Might be worth seeing if it does what you need. If you end up using a different library instead, it probably makes sense to replace the use of filetype with that (I think it's only used in one place).