unfurl
unfurl copied to clipboard
best way to disable exiftool2
I am running scrappy in a serverless environment where I have limited ability to spawn processes and definitely do not have access to perl. What is the best way to disable exiftool2 spawn? I attempted to only use the html plugin with scrapeUrl, but I either don't have the syntax right, or it still manages to call exiftool2.
I appreciate any insight you might be able to provide.
Turns out this is caused by the scrappy.helpers.iconSelector extract helper that I was using later.
I'd like to have access to the icons from the site.
Two thoughts:
- Let's improve https://github.com/blakeembrey/node-scrappy/blob/d63aaa5613901594730105a895efb397597d0d6d/src/extract/helpers/icon-selector.ts#L32-L33 so it uses itself (instead of using the default configurations), that way you can disable https://github.com/blakeembrey/node-scrappy/blob/d63aaa5613901594730105a895efb397597d0d6d/src/scrape/plugins/exif-data.ts which is your issue (this is used for all scrapers)
- Let's figure out if it's also possible to bundle the perl file into a single executable so it can run on lambda
That sounds good to me. Ideally not leaving javascript at all would help.
3.) Similarly https://github.com/digitalbazaar/jsonld.js/pull/184 was giving me some grief, as I am using webpack to build the serverless function, so there no longer is a package.json.
I'd love to help to at least get 1. going and then follow up with 2. I am not terribly familiar with typescript, but quite fluent in javascript, exif and media extraction in general.