CommonCrawler
CommonCrawler copied to clipboard
🕸 A simple way to extract data from Common Crawl
### Summary Make this program both accessible via Golang and Terminal. Ensure that it works correctly in the terminal. ### Requirements - Must have a download archive feature so that...
`docker build -t commoncrawler` returns ``` Sending build context to Docker daemon 15.68MB Step 1/9 : FROM golang ---> 2421885b04da Step 2/9 : ENV GO111MODULE=on ---> Using cache ---> 385def581eff...
### Summary An electron based GUI to query CommonCrawl servers. ### Requirements - Must be a Typescript based Electron app that can compile on Windows, Linux and Mac OS systems....
### Summary Docker for Windows is currently not functioning well. Based on my research. It doesn't seem like there is any way to reasonably run a Linux based container through...
### Summary Please enable this binary to be downloadable from the internet as a binary. As long as this is cURLable from the Github as a release. That would be...
### Summary CommonCrawler is easily accessible via AWS S3. However, I'm interested in creating some sort of IPFS based distribution of Common Crawl. This way we can self-host and create...
### Summary This repository should be accessible via `go get` and can be included easily into anyone else's project. ### Requirements Must be able to run: ```bash go get https://github.com/ChrisCates/CommonCrawler...
### Summary Full test coverage of all components. Must pass on Travis CI and on Unix. Branch coverage included should be 100%. CodeCov would be highly preferred over other testing...
Sometimes the network can fail or other things can happen... However, we don't have detailed logs for when a failure happens for a specific wet file... Would be nice to...