bigdata topic
BigData-Interview
:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
AutoCrawler
Google, Naver multiprocess image web crawler (Selenium)
optimus
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
kube-batch
A batch scheduler of kubernetes for high performance workload, e.g. AI/ML, BigData, HPC
Coding-Now
学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、网站、工具。涉及大数据几大组件、Python机器学习和数据分析、Linux、操作系统、算法、网络等
cds
Data syncing in golang for ClickHouse.
tispark
TiSpark is built for running Apache Spark on top of TiDB/TiKV
datafaker
Datafaker is a large-scale test data and flow test data generation tool. Datafaker fakes data and inserts to varied data sources. 测试数据生成工具
parquet4s
Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
bigdata-growth
大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。