big-data topic
img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
books
整理一些书籍 ,包含 C&C++ 、git 、Java、Keras 、Linux 、NLP 、Python 、Scala 、TensorFlow 、大数据 、推荐系统、数据库、数据挖掘 、机器学习 、深度学习 、算法等。
couchdb-fauxton
Fauxton is the new Web UI for CouchDB
matano
Open source security data lake for threat hunting, detection & response, and cybersecurity analytics at petabyte scale on AWS
amazon-s3-find-and-forget
Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
data-science-live-book
An open source book to learn data science, data analysis and machine learning, suitable for all ages!
couchdb-docker
Semi-official Apache CouchDB Docker images
qbeast-spark
Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!