iotdb
iotdb copied to clipboard
iotdb data compression mechanism problem
When a large amount of data is inserted, the final storage is estimated to occupy about 100G according to the disk occupancy ratio. After the data insertion is completed, after a few hours, it is found that the storage occupies 20G. Then after one night, it is found that the storage occupies 13G. What is the principle?
Hi, this is your first issue in IoTDB project. Thanks for your report. Welcome to join the community!
There are some Compaction threads in IoTDB which can select some data files to merge. The data of small chunks in data file will be merged to a large chunk after compaction, and the compression may works better.
IoTDB中有一些Compaction线程可以选择一些数据文件进行合并。数据文件中小块的数据经过压缩后会合并为大块,压缩效果可能会更好。
Thank you for sharing. Also, I would like to ask if you could help me with another machine (with lower CPU and memory than before). After repeatedly inserting this data, it took up 6.7GB of storage space, which is also something I am puzzled about (a machine with better CPU and memory took up 13GB) Also, when I try to delete some data, the storage space will not be released (including executing clear cache, full merge, flush commands). Is there a solution
The physical deletion is not happened when you send a request to delete the data, it also happened in Compaction. When you delete some data, it may generate a .mods file in your data directory to record which record is deleted so that the speed of deletion is fast.
Does another machine use the same configuration and same data to insert?
The physical deletion is not happened when you send a request to delete the data, it also happened in Compaction. When you delete some data, it may generate a .mods file in your data directory to record which record is deleted so that the speed of deletion is fast.
I checked the data directory to confirm that the relevant .mods files were generated, but after a few days, the .mods files still exist and the space has not been released. Do I need to manually clean up the .mods files?
Does another machine use the same configuration and same data to insert?
Server A, which occupies 13 GB of storage, is configured with 24 cores and 128 GB of memory, while server B, which occupies 7.6 GB of storage, is configured with 12 cores and 32 GB of memory. The inserted data is the same, but server A inserts all fields at once, and server B inserts part of the device data first. , and then insert data from another part of the device with the same timestamp
The physical deletion is not happened when you send a request to delete the data, it also happened in Compaction. When you delete some data, it may generate a .mods file in your data directory to record which record is deleted so that the speed of deletion is fast.
I checked the data directory to confirm that the relevant .mods files were generated, but after a few days, the .mods files still exist and the space has not been released. Do I need to manually clean up the .mods files?
These files will be deleted by compaction thread and you should not delete them manually. However, the compaction may not happen when some conditions are not reached to reduce the waste of system resources.
Does another machine use the same configuration and same data to insert?
Server A, which occupies 13 GB of storage, is configured with 24 cores and 128 GB of memory, while server B, which occupies 7.6 GB of storage, is configured with 12 cores and 32 GB of memory. The inserted data is the same, but server A inserts all fields at once, and server B inserts part of the device data first. , and then insert data from another part of the device with the same timestamp
Does another machine use the same configuration file of IoTDB?
Does another machine use the same configuration and same data to insert?
Server A, which occupies 13 GB of storage, is configured with 24 cores and 128 GB of memory, while server B, which occupies 7.6 GB of storage, is configured with 12 cores and 32 GB of memory. The inserted data is the same, but server A inserts all fields at once, and server B inserts part of the device data first. , and then insert data from another part of the device with the same timestamp
Does another machine use the same configuration file of IoTDB?
Yes, using the same docker compose configuration