bigdata topic

List bigdata repositories

BigData-Interview

1.6k
Stars
442
Forks
Watchers

:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

AutoCrawler

1.6k
Stars
408
Forks
Watchers

Google, Naver multiprocess image web crawler (Selenium)

optimus

1.4k
Stars
233
Forks
Watchers

:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

kube-batch

1.1k
Stars
265
Forks
Watchers

A batch scheduler of kubernetes for high performance workload, e.g. AI/ML, BigData, HPC

Coding-Now

994
Stars
308
Forks
Watchers

学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、网站、工具。涉及大数据几大组件、Python机器学习和数据分析、Linux、操作系统、算法、网络等

cds

957
Stars
137
Forks
Watchers

Data syncing in golang for ClickHouse.

tispark

877
Stars
252
Forks
Watchers

TiSpark is built for running Apache Spark on top of TiDB/TiKV

datafaker

620
Stars
167
Forks
Watchers

Datafaker is a large-scale test data and flow test data generation tool. Datafaker fakes data and inserts to varied data sources. 测试数据生成工具

parquet4s

277
Stars
68
Forks
Watchers

Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.

bigdata-growth

1.3k
Stars
331
Forks
Watchers

大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。