pkglite
                                
                                 pkglite copied to clipboard
                                
                                    pkglite copied to clipboard
                            
                            
                            
                        Guess filetype for files without extensions
When evaluating file specifications to create file collections, we should follow this:
- If a file has a known extensions, mark it as text or binary based on the dictionary (implemented)
- Include files that do not have a file extension, and files with extensions not covered by the dictionary
- Guess if the file is (canonically) text, otherwise mark them as binary
- I'd prefer zlib's algorithm: https://github.com/madler/zlib/blob/master/doc/txtvsbin.txt
- If a file does not have any content, then mark it as binary
 
 
- Guess if the file is (canonically) text, otherwise mark them as binary
- Document this flow in the specification section
From Yilong: or, simply classify files with unknown extensions as binary files.
The goal is to separate file capture rules and file type tagging rules and make them more universal, instead of limiting both flows with only known file extensions.
Action items:
- For file capturing: Make some file specifications not file extension-based by removing the file name pattern constraint, e.g., file_inst(), to make them capture arbitrary files.
- For file type tagging: Revise the tagging strategy by using the file extension dictionary + marking everything else binary.
- Add file specification functions for more directories observed here: demo/,exec/,po/,build/.
Shall we close the issue?
Not yet. This hasn't been shipped.