PHP-Grab-Favicon icon indicating copy to clipboard operation
PHP-Grab-Favicon copied to clipboard

Roadmap: More Enhancements in Development

Open LeeThompson opened this issue 1 year ago • 6 comments

Status:

June 23rd 2023 Haven't been able to do much work this week due to some unexpected household emergencies, should be back at it next week.

202306161401

  • Added a MIME database, too many functions were all doing different types of lookups and so I consolidated it into a database "object". Works pretty well so far.
  • Added a content buffer for internal processing. The goal is to prevent unnecessary reloading of data.
  • Added yet more switches, mostly end users want to see the HTTP warnings (4xx) and errors (5xx) they can be enabled in the ini or --showhttpwarnings --showhttperrors. (With the options enabled, they will output as TYPE_WARNING and TYPE_ERROR.)
  • Image size doesn't always work even if a valid image so until that gets resolved, if you specify a minimum size it's more of a 'goal'.
  • Added SVG detect to our own data check
  • Added new logging levels TYPE_OBJECTS, TYPE_TIMERS (full debug logging is now 1023)
  • convertRelativeToAbsolute now has one return path making for easier debugging.
  • Tightened up domain parsing and regex code, it should be a bit better dealing with subdomains.
  • Integrating MIME Database, checkIconAcceptance, and other new things to existing code. Then I'm going to do a battery of tests, after which I'm going to simplify/optimize and continue with adding the remaining new features (check local icon, etc).

202306121848:

  • Added new 'extensions' section to the ini, it's mostly for testing but could be used if something isn't working right. They are all simple boolean values (true or false), the list is: curl, exif, get, put, mbstring, fileinfo, mimetype, gd, imagemagick, gmagick, hrtime. If an extension is listed as true in this section but is not loaded or available, it will change to false. (Please note, GD, ImageMagick and gmagick are not currently used at all.)
  • Image identification fallbacks added to local file loading
  • Been testing/fixing up extension/function fallback code
  • Fixed an issue where the log could be initialized too soon and not honor some settings
  • Added --sites as an alternate to --list
  • Added raw datacheck for most common icon formats
  • Added a "confidence" level, not used other than logging yet
  • This isn't the "big update" yet, as I wanted to test some of the fallbacks before I started going down the bigger rabbit hole and that's probably going to continue this week.

Some notes on this:

Having our own image identification is important should the PHP installation be limited (for whatever reason) and going by file extension is still the last resort.

The method used for this is looking for the "signature" of the image file. Most image formats have a header with signature data to be used by software trying to open it (this is also called a "magic number".) The new code knows PNG, GIF, JPEG, WEBP, BMP and ICO formats.

Some image formats are easier to identify than others, for example PNG format's "magic" which is \x89PNG\r\n\x1A\n which is pretty good. BMP and ICO have very very simple identifiers and so having false positives is much more likely which is why I've been adding a "certainty" rating. Eventually you'll be able to set a minimum acceptable "certainty" and reject possibly invalid files. (You can currently set it but nothing looks at it.)

Here's some sample trace logging showing this in action:

2023-06-12 18:47:21 [TRACE] [grap_favicon(20):listIcons:getMIMETypeFromFile] pathname='icons/whatsapp.png', content_type=image/png, confidence=certain, method=signature

Ideally, if everything is available to get-fav.php the following methods are used, in order:

  1. The content-type returned by the server (remote only)
  2. FileInfo
  3. mime_content_type (local files only)
  4. exif_imagetype (and image_type_to_mime_type if available)
  5. getMIMETypeFromBinary (the new fallback function using "magic")
  6. file extension

202306071311:

  • Initial work for processing parameters in HTML mode created (completely untested)
  • Added --checklocal / --nochecklocal, --storeifnew (requires --checklocal and --store) ( Not implemented yet. )
  • Added --showconfig / --noshowconfig to show running configuration options
  • Added --showconfigonly (implies --showconfig), shows running configuration and exits.
  • Added --silent (console mode only) (turns off the console completely)
  • Near the top of the script there are two defines ENABLE_SAME_FOLDER_INI and ENABLE_SAME_FOLDER_API_INI. They default to false. If they are set to true, if get-fav.ini and get-fav-api.ini, respectively, are in the same folder as get-fav.php they will be read and used automatically. --configfile and --apiconfigfile, if specified, will be applied after.

It will likely be a few days before I do another git push as the next one is a big one:

  • Path write checking
  • Check local icons against criteria (if required, replacements will be downloaded; if the current icon is ok but there is a different icon online, if storeifnew is enabled it will be replaced)
  • Icons will also be tested for size criteria.
  • Blocklists will be applied.
  • Code will be put in place for storing local icons in sub-folders.
  • Some test HTTP mode variables will be parsed.
  • Documentation will be updated.

202306062230:

  • Refined HTTP Response Parsing (now includes general 'class' of response as part of the data)
  • PHP .ini values are now in defines so if something changes down the road it's easier to update
  • Added more parameter checking
  • Added major/minor to version
  • If cURL is disabled and file_get_contents is not available, check if PHP.INI: allow_url_fopen is disabled, if so show an error message.

202306042323:

  • Bug fixing.
  • Added --apiconfigfile=PATHNAME to load API Definitions
  • Loading of 'same folder' API and config file can be controlled in the special runtime defines section. Default is OFF. (They can always be overridden with command line switch)
  • API: Updated favicongrabber's built-in definition
  • API: Added iconhorse to built-in definition
  • Added more to the capabilities structure
  • If exif is used content-type will be looked up using the image_type_to_mime_type function
  • Capability checking is more thorough and accurate. ("exif" requires "mbstring" etc)

202306021445:

  • Mostly "under the hood" work today. Mostly internal structures prepping for some of the features still being implemented.
  • Added a HTTP response parser for cleaner coding and better log messaging
  • Made some small changes for PHP 5.6.40 compatibility

202306011529:

  • Today one of the APIs was returning 502 errors which gave me the opportunity to add some error handling.
  • Rewrote the JSON parsing for APIs, this required a change to the .ini file for APIs but it should be more flexible (once all the bugs are fixed)
  • It will now go through more than one icon record (for API's that support it) and return the first that matches criteria (size, format, etc) (I need to do the same for the regex search.)
  • Added another switch pair --allowoctetstream / --disallowoctetstream, the default is false because if the more accurate content-type detection is not available most will return application/octet-stream. I may make the default true if and mime_content_type and/or finfo_open are available. (.ini file is [global] allow_octet_stream=boolean )
  • If in debugMode (--debug and/or debug/trace/special logging) active settings will be shown.
  • minor changes for PHP 8.2 compatibility
  • This version has only been tested with PHP 8.2.6.

202305312016:

  • Mostly bug fixing and optimization.
  • Debug logging is at about 80% complete.
  • changed more internal structures, probably not done with that (mostly to accommodate new features)
  • added tenacious mode will try all APIs until it gets a successful result (default is off)
  • added precision timers for internal use
  • it will now warn if, due to the PHP configuration, some functions that identify formats are not available that results may not be that great
  • you can now specify what icon types are acceptable (careful) (note: it is not wired in everywhere yet)

202305281757:

  • Added a 4th API (INI file only right now)
  • Rewrote API randomizer
  • Setting up proper debug logging which is about 20% complete
  • Unified output into the new logging function (automatically renders HTML if not in console mode). (It is possible now to have the script not output anything if you disable both file and console outputs.)
  • Added switch for icon size
  • Added switches for console output (timestamps, level, etc)
  • Debug/HTML mode icons should set the correct MIME type for display (not tested)
  • I know it's looking like a lots been done, and it has but very little has been tested. If you choose to try my branch out, please keep that in mind.

202305251719:

  • Debug logging added (not implemented much yet).
  • Greatly improved image detection although it uses fileinfo which may not be installed everywhere. It will fallback to exif etc.
  • Introduced HTTP Load buffering. If the load function gets a URL that it already loaded it will just return what it got last time. (can be disabled)
  • A lot of new "under the hood" functions, if you choose to play with it from my fork be very careful and please report bugs.

202305242106:

  • APIs can be read in from an INI file (get-fav-api.ini)

202305241803:

  • Added remove TLD support (needs a lot of testing)
  • Made load function allow recursion for redirects (needs a lot of testing)

202305241420:

  • Bugfix. Now setting timeout for PHP level HTTP and socket operations. (#13)
  • Bugfix. Now keeps specified protocol active (http, https) (#12)
  • Preliminary support to keep port and user/password information (not hooked up yet)
  • Added a new direct try, it takes the url and adds favicon.ico to it and sees if it gets anything then falls back to previous behavior.

202305221634:

  • Reads config files, command line switches will always override any ini setting. (It is using parse_ini_file with INI_SCANNER_RAW, does array_replace_recursive with the existing configuration structure and finally validates boolean/numeric (with range checks).)

202305231619:

  • Path and other settings are validated
  • Settings are checked against capabilities
  • Updated --help
  • Help menu now shows actual defaults from the defines
  • Help menu now shows available APIs (* by ones that are disabled)
  • Updated copyright notice (year changed to 2019-2023)
  • Individual APIs can be enabled/disabled

Stuff being worked on:

(I'm keeping my github fork up to date as I work on stuff, assuming it's not throwing horrible errors.)

  • [ ] New ~--checkicon~ --checklocal option will check the icon in the local path first and check online only if missing or otherwise invalid (size, type, blocklist). (in progress)
  • [ ] The main design of the script seems to be as a server side script so I plan to add options for it (passed in via query string or form, default will be disabled for security reasons) (#11) (in progress)
  • [ ] Icon validation where it can be checked with generic fallback icons (via md5 hash comparisons in a 'blocklist') (in progress)
  • [ ] Updating README.MD to reflect command line switches etc. (in progress)
  • [ ] Document functions and config file format (ini file). (in progress)
  • [ ] MD5 fragment sub-folder option (#14)
  • [x] Configuration file support (command line switches will still override the config)
  • [x] Added configuration.md for detailed help on options.
  • [x] Redoing configuration throughout the code (to better handle config file overrides) (it's more of an array structure)
  • [x] Add a configuration validation check for paths
  • [x] Moved defaults & constants to defines for easier maintenance.
  • [x] Improved error handling
  • [x] Add code to enable/disable individual apis by name (.e.g. --disableapis=google,faviconkit)
  • [x] Option to strip the TLD domain from the filename (.e.g microsoft.com.ico becomes microsoft.ico)
  • [x] Investigate defining APIs in the ini file.
  • [x] Adding more comments to code
  • [x] Log file support with timestamp and append options (mostly for debugging purposes)
  • [x] Final configuration validation check should include capabilities, so if you force enable curl but you php doesn't have it, it should use the fallback.
  • [x] Added --version (aka v and ver)
  • [x] Added a version as a define
  • [x] Some bug fixing
  • [x] More command line switches for troubleshooting and for specific situations allowing control over connection, http and dns timeouts.
  • [x] Changed $debug to a bool
  • [x] cURL path now handles http->https redirects.
  • [x] PHP's user agent is now set as well as cURLs (not permanently) (#7) if --user-agent is passed in.
  • [x] Allow manual disabling of curl.
  • [x] New structure for APIs (will allow adding APIs in the future). (NOTE: it does not currently fallback if the randomly selected one fails)
  • [x] Ability to enable/disable individual API methods
  • [x] Unifying message output/debug messages (function writeOutput)
  • [x] Update command line help.
  • [x] API definitions should allow for apikey (untested)

Issues:

  • [x] New API system allows for more APIs but currently doesn't allow fallbacks
  • [x] --help output takes more than one standard console screen (| more or | clip need to be used)
  • [x] exif_imagetype fails on some sites for some reason, probably because fopen isn't doing something it likes. May add a 'temporary' download of the potential icon file for analysis instead of a direct open. (#13) (Partial fix, should be used less.)

Before pull request:

  • [ ] Lots of testing
  • [ ] HTML mode testing
  • [ ] Regression testing with PHP 5, PHP 7 and PHP 8
  • [ ] Bug fixes

Other Tasks:

  • [ ] "How to use" will need to be updated.

Notes:

  • Most of the internal structure has changed. There are now functions to set (and validate) and get configuration data.
  • The main function now just needs a url, it gets the configuration data when it starts.
  • This will make reading an ini config file and applying it much easier which will be the next step.
  • Almost all constants are now in a define block at the top,
  • The "how to use" notes will need to be updated.
  • I am now testing with PHP 5.6.4, 7.4.33, 8.1.19 and 8.2.6.

LeeThompson avatar May 19 '23 20:05 LeeThompson