unfurl best way to disable exiftool2

best way to disable exiftool2

Open svanzoest opened this issue 8 years ago • 3 comments

I am running scrappy in a serverless environment where I have limited ability to spawn processes and definitely do not have access to perl. What is the best way to disable exiftool2 spawn? I attempted to only use the html plugin with scrapeUrl, but I either don't have the syntax right, or it still manages to call exiftool2.

I appreciate any insight you might be able to provide.

May 29 '17 01:05 svanzoest

Turns out this is caused by the scrappy.helpers.iconSelector extract helper that I was using later. I'd like to have access to the icons from the site.

May 29 '17 01:05 svanzoest

Two thoughts:

Let's improve https://github.com/blakeembrey/node-scrappy/blob/d63aaa5613901594730105a895efb397597d0d6d/src/extract/helpers/icon-selector.ts#L32-L33 so it uses itself (instead of using the default configurations), that way you can disable https://github.com/blakeembrey/node-scrappy/blob/d63aaa5613901594730105a895efb397597d0d6d/src/scrape/plugins/exif-data.ts which is your issue (this is used for all scrapers)
Let's figure out if it's also possible to bundle the perl file into a single executable so it can run on lambda

May 29 '17 17:05 blakeembrey

That sounds good to me. Ideally not leaving javascript at all would help.

3.) Similarly https://github.com/digitalbazaar/jsonld.js/pull/184 was giving me some grief, as I am using webpack to build the serverless function, so there no longer is a package.json.

I'd love to help to at least get 1. going and then follow up with 2. I am not terribly familiar with typescript, but quite fluent in javascript, exif and media extraction in general.

May 29 '17 18:05 svanzoest

unfurl unfurl copied to clipboard

best way to disable exiftool2

unfurl
unfurl copied to clipboard