Auto-CORPus icon indicating copy to clipboard operation
Auto-CORPus copied to clipboard

Auto format detection

Open Antoinelfr opened this issue 10 months ago • 1 comments

Is your feature request related to a problem? Please describe

I would like to put all the documents in one folder, and AC can automatically process them with the right parsing method. We can use the extension file, but sometimes, XML are actually HTML.

Describe the solution you'd like

I mentioned it above.

Describe alternatives you've considered

no

Additional context

no

Antoinelfr avatar Mar 10 '25 14:03 Antoinelfr

@Thomas-Rowlands This is the library we're already using: https://pypi.org/project/filetype/

Might be worth seeing if it does what you need. If you end up using a different library instead, it probably makes sense to replace the use of filetype with that (I think it's only used in one place).

alexdewar avatar Mar 11 '25 15:03 alexdewar