Ben Muthalaly comments

Results 10 comments of


                                            Ben Muthalaly

New Extractor Idea: Find/write a "`cad-dl`" to save 3d assets, gltf files, CAD files, shapefiles, STLs, etc.

I tried to find existing tools to extract these files, but haven't had success yet. > One simple solution we could do is run all the URLs in found in...

New Extractor Idea: Find/write a "`cad-dl`" to save 3d assets, gltf files, CAD files, shapefiles, STLs, etc.

I've checked quite a few websites for test cases and can't find any that directly link to 3d assets. I could be looking in the wrong places though. I'd appreciate...

Feature Request: OCR archived PDF files to extract titles and full-text contents

@pirate pd3f seems like the most capable tool, but [the docs](https://pd3f.com/docs/pd3f/installation/) say that it takes 8GB. Just wanted to check if we're okay with a dependency that heavy before I...

New Extractor Idea: `gallery-dl` for image gallery downloading

@pirate Should the `media` extractor also handle `gallery-dl`? I'm not sure if the `media` extractor is just an alias for `youtube-dl` or if it's supposed to handle other visual media...

New Extractor Idea: `gallery-dl` for image gallery downloading

Ah cool! Should I still keep working on a `gallery-dl` branch based off `dev`, or should I wait until the plugin system has merged?

Show banner to upgrade to latest version when ArchiveBox is out of date

@pirate Couple of questions about the implementation for this: - Should the majority of the logic be implemented in JS or Python? I've gotten something mostly working in JS, but...

New Extractor Idea: `scihub-dl` to auto-detect inline DOI numbers and download academic paper PDFs

I have a rough script for this working (just using `scihub.py` as a module and downloading a pdf with the url->doi fallback). I had to slightly modify `scihub.py` to get...

New Extractor Idea: `scihub-dl` to auto-detect inline DOI numbers and download academic paper PDFs

Here's what I have so far https://gist.github.com/benmuth/b2c12cbb40ca4d8183c6f17f819e2f2d @pirate Usage: ``` python scihub-extractor.py -d -o ``` or ``` python scihub-extractor.py -f -o ``` It should either - download the paper directly...

New Extractor Idea: `scihub-dl` to auto-detect inline DOI numbers and download academic paper PDFs

> @benmuth it might take me a month or more till I'm able to merge this, as I'm working on some paid ArchiveBox projects right now for a client that...

New Extractor Idea: `scihub-dl` to auto-detect inline DOI numbers and download academic paper PDFs

@pirate Yeah, I think that's a great idea, I'd be happy to try to work on this. I think a more comprehensive tool should definitely exist. Thanks for the overview,...