liukunyuan issues

Results 4 issues of


                                            liukunyuan

hadoop3.x版本,hdfs connector不支持orc

因为hive的版本是1.1.1，不支持3.x版本的hadoop，会失败 ![image](https://user-images.githubusercontent.com/50513095/146141347-8096fee4-caf2-445f-af00-b31452a985dd.png)

bug

mongodbreader插件优化

1.原始的切片算法 col.find().skip(skipCount).limit(chunkDocCount).first()没有加过滤条件，在对大数据量的mongodb表进行切片时，耗时太长。对mongodb的切片算法加上过滤条件 2.重写mongodb的反序列化，原有的方式对于很多mongodb类型处理的不好。 3.增加mongo的登录验证方式 4.新增了batchsize参数，批量读取mongodb表数据 5.设置read preference，优先读取副本 6.增加jsonType参数，可以将mongodb所有数据扫描json，而不是扫描固定的column字段（需求变化频繁的表）

[Feature][all] We are ready to contribute multiple features

### Search before asking - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar feature requirement. ### Description We have these feature within our company and hope to...

feature

discussion

Mongodbreader update

1.原始的切片算法 col.find().skip(skipCount).limit(chunkDocCount).first()没有加过滤条件，在对大数据量的mongodb表进行切片时，耗时太长。对mongodb的切片算法加上过滤条件.并针对mongodb3.2+版本增加Sample切片算法，切片速度提升10倍。在对4TBmongodb集合进行切片划分时，也能在10秒内完成（query过滤条件加了索引） 2.重写mongodb的反序列化，原有的方式对于很多mongodb类型处理的不好。 3.增加mongo的登录验证方式 4.新增了batchsize参数，批量读取mongodb表数据 5.设置read preference，优先读取副本 6.增加jsonType参数，可以将mongodb所有数据扫描json，而不是扫描固定的column字段（需求变化频繁的表） 7.对18个不同mongodb库，72张mongodb表都进行了测试。在channel=1和channel=4时，数据量能保持一致。 https://github.com/alibaba/DataX/issues/738