euler 基于Spark的graph_data_parser数据生成问题

基于Spark的graph_data_parser数据生成问题

Open ZunwenYou opened this issue 5 years ago • 8 comments

Spark的executor用HDFSWriter生成part_x.dat二进制文件，部分part读取报“data error”的错误；我们排除了数据格式不对可能性（用生成的json文件，单机生成dat文件这种方式是OK的）现象如下：

Update: Spark executor的Core改成1，问题就解决了。是Writer的flush出现问题了吗？

May 08 '19 06:05 ZunwenYou

ping @yangsiran

May 08 '19 06:05 ZunwenYou

This issue can be fixed by adding hflush function in HDFSWriter class. And also, you should call the hflush function after everything is done.

May 08 '19 09:05 intoraw

@pgplus1628 As showed in last post, writer will flush after every record is written.

May 08 '19 09:05 ZunwenYou

@ZunwenYou oh, I mean hflush.

May 08 '19 11:05 intoraw

@pgplus1628 you are right.

May 08 '19 11:05 ZunwenYou

@ZunwenYou 屏幕快照 2019-06-19 下午7 50 35 这是我的 spark 写 dat 文件的代码，然而写文件的代码好像并没有被执行，请问是什么原因? 求教

Jun 19 '19 11:06 arsenezhang

@arsenezhang rdd need a action to trigger lazy operation. you have to execute resultRDD.count()

Jun 23 '19 03:06 ZunwenYou

您好，我用spark生成训练数据一直解析有问题，能否劳驾发一份spark生成训练数据的代码给我呢^_^

Oct 21 '20 08:10 ziyang599