incubator-uniffle icon indicating copy to clipboard operation
incubator-uniffle copied to clipboard

[Bug] Blocks read inconsistent: expected xxx blocks, actual xxx blocks

Open xianjingfeng opened this issue 2 years ago • 12 comments

  1. If we set spark.rss.data.replica.write=2 and spark.rss.data.replica=3,Data integrity cannot be guaranteed in any one shuffle server. right?
  2. But in method org.apache.uniffle.storage.handler.impl.LocalFileQuorumClientReadHandler#readShuffleData, it just read from one shuffle server

xianjingfeng avatar Aug 04 '22 02:08 xianjingfeng

Which version did you use?

Do you set spark.rss.data.replica.read=2 ? It ensures the bitmap metadata of blocks to be written to 2 servers.

As long as the read client gets the metadata from the 2 of servers, it can check the integrity of data from any one of server.

frankliee avatar Aug 04 '22 03:08 frankliee

Do you set spark.rss.data.replica.read=2

Yes

As long as the read client gets the metadata from the 2 of servers, it can check the integrity of data from any one of server.

But this step seems execute before readShuffleData

xianjingfeng avatar Aug 04 '22 03:08 xianjingfeng

Which version did you use

internal version 0.5.0-snapshot

xianjingfeng avatar Aug 04 '22 03:08 xianjingfeng

Do you set spark.rss.data.replica.read=2

Yes

As long as the read client gets the metadata from the 2 of servers, it can check the integrity of data from any one of server.

But this step seems execute before readShuffleData

The metadata is acquired in advance, but data integrity check is executed when all blocks have been fetched. In current implementation, the client will only fetch “the first available” server to avoid the read cost. But when the data in this first server is damaged, the final check will report "read inconsistent".

frankliee avatar Aug 04 '22 06:08 frankliee

I know, but the application will fail

xianjingfeng avatar Aug 04 '22 06:08 xianjingfeng

Do you set spark.rss.data.replica.read=2

Yes

As long as the read client gets the metadata from the 2 of servers, it can check the integrity of data from any one of server.

But this step seems execute before readShuffleData

The metadata is acquired in advance, but data integrity check is executed when all blocks have been fetched. In current implementation, the client will only fetch “the first available” server to avoid the read cost. But when the data in this first server is damaged, the final check will report "read inconsistent".

I feel a little unreasonable about this implement. Should we read next shuffle server when the data isn't complete?

jerqi avatar Aug 04 '22 06:08 jerqi

I feel a little unreasonable about this implement. Should we read next shuffle server when the data isn't complete?

I am trying to do this, and i think it needs to be fixed with #108 together

xianjingfeng avatar Aug 04 '22 06:08 xianjingfeng

I would be happy to review this PR, and you should avoid to fetch redundancy blocks from the another server (because the spark has consumed this blocks). Rss has provided some skipping mechanisms for localfile and hdfs. But I'am worry about memory data. @jerqi

frankliee avatar Aug 04 '22 06:08 frankliee

I would be happy to review this PR, and you should avoid to fetch redundancy blocks from the another server (because the spark has consumed this blocks). Rss has provided some skipping mechanisms for localfile and hdfs. But I'am worry about memory data. @jerqi

In my opinion, memory data should also have data skip ability, and our read memory process should be optimized.

jerqi avatar Aug 04 '22 07:08 jerqi

Get

xianjingfeng avatar Aug 04 '22 07:08 xianjingfeng

This will change server's memory storage to add "index" like hdfs

frankliee avatar Aug 04 '22 07:08 frankliee

This will change server's memory storage to add "index" like hdfs

This problem will should discuss in another issue, we also should have a simple design doc.

jerqi avatar Aug 04 '22 07:08 jerqi

closed by #276

jerqi avatar Nov 28 '22 16:11 jerqi