carbondata icon indicating copy to clipboard operation
carbondata copied to clipboard

[CARBONDATA-3692] Support NoneCompression during loading data.

Open Pickupolddriver opened this issue 5 years ago • 7 comments

Why is this PR needed?

In some cases, the data need to be uncompressed after loading into Carbondata file. In the current version, the project do not support loading data without compression.

What changes were proposed in this PR?

Provide a new Compressor as NoneCompressor implement the AbstractCompressor. This compressor can be set by calling CarbonProperties.getInstance().addProperty(CarbonCommonConstants.COMPRESSOR,"none");

Does this PR introduce any user interface change?

Yes

Is any new testcase added?

Yes

Pickupolddriver avatar Feb 11 '20 11:02 Pickupolddriver

Can one of the admins verify this patch?

CarbonDataQA1 avatar Feb 11 '20 11:02 CarbonDataQA1

Can you please explain the scenario where no-compression would be beneficial?

kunal642 avatar Feb 11 '20 12:02 kunal642

Can you please explain the scenario where no-compression would be beneficial?

This NoneCompress Compressor will improve the speed of loading data from Flink to OBS File by trade-off space and IO in some cases.

For example: when loading data from Flink to OBS, data needs to be compressed by Flink to temporary files and then decompressed by OBS. After adding the NoneCompressor, users can use the NoneCompressor load data without compress first and then decompress the temporary files.

Pickupolddriver avatar Feb 12 '20 04:02 Pickupolddriver

@Pickupolddriver : Agree that it can improve the loading speed. But data will be 3x bigger. So, storage cost on OBS will be 3x more!

ajantha-bhat avatar Feb 12 '20 04:02 ajantha-bhat

@Pickupolddriver : Agree that it can improve the loading speed. But data will be 3x bigger. So, storage cost on OBS will be 3x more!

Data would be processed after loaded to OBS. So if we could provide a NonCompressor, it could avoid the data being compressed and then uncompressed. And the uncompressed data would be deleted after processed in OBS.

Pickupolddriver avatar Feb 14 '20 10:02 Pickupolddriver

add to whitelist

ajantha-bhat avatar Feb 21 '20 07:02 ajantha-bhat

please rebase

QiangCai avatar Feb 25 '20 02:02 QiangCai