iotdb icon indicating copy to clipboard operation
iotdb copied to clipboard

iotdb data compression mechanism problem

Open qq461613840 opened this issue 1 year ago • 10 comments

When a large amount of data is inserted, the final storage is estimated to occupy about 100G according to the disk occupancy ratio. After the data insertion is completed, after a few hours, it is found that the storage occupies 20G. Then after one night, it is found that the storage occupies 13G. What is the principle?

qq461613840 avatar Oct 13 '23 05:10 qq461613840

Hi, this is your first issue in IoTDB project. Thanks for your report. Welcome to join the community!

github-actions[bot] avatar Oct 13 '23 05:10 github-actions[bot]

There are some Compaction threads in IoTDB which can select some data files to merge. The data of small chunks in data file will be merged to a large chunk after compaction, and the compression may works better.

shuwenwei avatar Oct 17 '23 06:10 shuwenwei

IoTDB中有一些Compaction线程可以选择一些数据文件进行合并。数据文件中小块的数据经过压缩后会合并为大块,压缩效果可能会更好。

Thank you for sharing. Also, I would like to ask if you could help me with another machine (with lower CPU and memory than before). After repeatedly inserting this data, it took up 6.7GB of storage space, which is also something I am puzzled about (a machine with better CPU and memory took up 13GB) Also, when I try to delete some data, the storage space will not be released (including executing clear cache, full merge, flush commands). Is there a solution

qq461613840 avatar Oct 17 '23 06:10 qq461613840

The physical deletion is not happened when you send a request to delete the data, it also happened in Compaction. When you delete some data, it may generate a .mods file in your data directory to record which record is deleted so that the speed of deletion is fast.

shuwenwei avatar Oct 17 '23 06:10 shuwenwei

Does another machine use the same configuration and same data to insert?

shuwenwei avatar Oct 17 '23 07:10 shuwenwei

The physical deletion is not happened when you send a request to delete the data, it also happened in Compaction. When you delete some data, it may generate a .mods file in your data directory to record which record is deleted so that the speed of deletion is fast.

I checked the data directory to confirm that the relevant .mods files were generated, but after a few days, the .mods files still exist and the space has not been released. Do I need to manually clean up the .mods files?

qq461613840 avatar Oct 17 '23 07:10 qq461613840

Does another machine use the same configuration and same data to insert?

Server A, which occupies 13 GB of storage, is configured with 24 cores and 128 GB of memory, while server B, which occupies 7.6 GB of storage, is configured with 12 cores and 32 GB of memory. The inserted data is the same, but server A inserts all fields at once, and server B inserts part of the device data first. , and then insert data from another part of the device with the same timestamp

qq461613840 avatar Oct 17 '23 07:10 qq461613840

The physical deletion is not happened when you send a request to delete the data, it also happened in Compaction. When you delete some data, it may generate a .mods file in your data directory to record which record is deleted so that the speed of deletion is fast.

I checked the data directory to confirm that the relevant .mods files were generated, but after a few days, the .mods files still exist and the space has not been released. Do I need to manually clean up the .mods files?

These files will be deleted by compaction thread and you should not delete them manually. However, the compaction may not happen when some conditions are not reached to reduce the waste of system resources.

shuwenwei avatar Oct 17 '23 07:10 shuwenwei

Does another machine use the same configuration and same data to insert?

Server A, which occupies 13 GB of storage, is configured with 24 cores and 128 GB of memory, while server B, which occupies 7.6 GB of storage, is configured with 12 cores and 32 GB of memory. The inserted data is the same, but server A inserts all fields at once, and server B inserts part of the device data first. , and then insert data from another part of the device with the same timestamp

Does another machine use the same configuration file of IoTDB?

shuwenwei avatar Oct 17 '23 07:10 shuwenwei

Does another machine use the same configuration and same data to insert?

Server A, which occupies 13 GB of storage, is configured with 24 cores and 128 GB of memory, while server B, which occupies 7.6 GB of storage, is configured with 12 cores and 32 GB of memory. The inserted data is the same, but server A inserts all fields at once, and server B inserts part of the device data first. , and then insert data from another part of the device with the same timestamp

Does another machine use the same configuration file of IoTDB?

Yes, using the same docker compose configuration

qq461613840 avatar Oct 17 '23 07:10 qq461613840