alluxio
alluxio copied to clipboard
Client sends double traffic to workers when set alluxio.user.file.writetype.default=CACHE_THROUGH
Alluxio Version: Alluxio-2.9
Describe the bug
Client sended double traffic to workers when set alluxio.user.file.writetype.default
to CACHE_THROUGH
. The data were written into the alloxio worker and ufs, as shown below.
client -> worker block client -> worker -> ufs
Instead of sending data once to write to workers and UFS like below.
client -> worker/worker block -> ufs
To Reproduce
- Set
alluxio.user.file.writetype.default=CACHE_THROUGH
- Use
copyFromLocal
command to Upload files.
Expected behavior Determine whether to write UFS when writing data to the worker for the first time, rather than sending data to the worker again by the client and then writing UFS.
When the client writes data to the worker, it is first written to the worker's alluxio block by mCurrentBlockOutStream
, and then the data sent to the worker by mUnderStorageOutputStream
is written to ufs. On the worker side it is handled by BlockWriteHandler
and UfsFileWriteHandler
respectively.
The reason for sending two copies of the data to the worker is probably know, because alluxio client to the worker is to write block, each block corresponds to a stream, and write hdfs can only be used with the same stream, that is, can not be processed at the same time, and the blocks of a file may be distributed in different workers, so that can not write hdfs file through a same client, so there is now the implementation.