iotdb
iotdb copied to clipboard
[IOTDB-3611] Support "Modify Time Series Encoding and Compression Type" interface/command
Description
Target:
Support "Modify Encoding Type and Compression Type" command
Application scenarios:
In IoTDB application projects, reasonable setting of encoding and compression algorithms can effectively reduce disk space occupancy and reduce server costs in disguise. Modifying encoding and compression algorithms is an ideal method.
Example:
The physical quantity root.sg1.device_1.m1, the initial encoding type is PLAIN, and the data characteristics are straight-up data 1,2,3,4,5.....
After a long period of accumulation, the data occupies about 1G of hard disk space. Now after modifying the encoding type to TS_2DIFF, the data occupying the hard disk space can be reduced to less than 100M.
plan selection
About the selection of single/multiple physical quantity modification
Option 1: A single command only modifies the encoding type and compression type of a single physical quantity
Option 2: A single command supports batch modification of physical quantity encoding type and compression type
Conclusion: Option 1 is selected for the first version, and Option 2 is supported after the command verification is passed.
About the selection of the command affecting the data range
Option 1: Affect the newly inserted data after modification
Option 2: Affect all sealed, unsealed and newly inserted data after modification
Conclusion: The first version chooses option 2. After the command is executed successfully, you can immediately see the disk changes
Merge related code changes
Before developing the function, it is necessary to modify the original merged code to adapt to this function
1. When the same physical quantity has different encoding types or compression types in the tsfile file, the merged tsfile will be damaged, and this problem needs to be fixed
2. Since the merge process is a mutually exclusive operation, it is necessary to increase the lock control
Modify the encoding type and compression type positive process
1. Verify the request parameters before execution, including some non-null verification, cluster status verification, etc.
2. Modify the encoding type and compression type in the schema
3. Void the schema cache
4. Find the storage group
5. Perform modification operations by virtual storage group
6. Force close working TsFileProcessors
7. Get rewriteLock
8. Generate alter log
9. Rewrite the ordered and unordered tsfile data operations separately
9.1. (Un)sequenceListByTimePartition
9.2. Traverse tsFileResource
9.3. Filter unexecuted tsFileResource (recovery)
9.4. Generate targetTsFileResource
9.5. Rewrite tsFileResource
9.5.1. Acquire tsFileResource read lock
9.5.2, read device list
9.5.3. startChunkGroup
9.5.4, read chunks
9.5.5. If the measurement is not modified by the target, write the chunk directly
9.5.6. If it is a measurement modified by the target, read pages and points one by one, re-encode and compress them before writing
9.5.7. endChunkGroup
9.5.8. endFile
9.5.9. Release the tsFileResource read lock
9.6. Rename the file to .tsfile->.alter.old .alter->.tsfile
9.7. Replace tsFileResource and targetTsFileResource
9.8. Delete the original tsfile related files (.tsfile .resource .mods)
10. Delete alter log
11. Release rewriteLock
Recovery operation after schema modification
mlog adds AlterTimeSeriesPlan and implements recovery method at the same time
Service restart recovery operation
1. Determine whether RecoverAlter is required before RecoverCompaction
2. Execute recoverAlter before initCompaction
recoverAlter method flow
1. Analyze alter.log to get a list of unfinished tsfiles
2. Check the list of unfinished tsfiles and perform pre-repair operations
3. Rewrite the tsfile operation
Pre-Recovery Action Policy
Incomplete tsfile status:
1. There is no .tsfile
1.1, .alter.old exists and .alter exists - wait for completion
1.2, only .alter.old exists - system exception
1.3, only exists .alter - system exception
2. There is .tsfile
2.1, exists. alter - writing
2.2, exist.alter.old - wait for delete
2.3, does exist - not started
Need to continue to improve the content
1. Aligned time series data rewrite optimization
2. Implementation of RSchemaRegion, SchemaRegionSchemaFileImpl related methods
3. Support for clusters
4. The tsfile rewrite operation is changed to asynchronous execution
5. Support batch modification of physical quantities
This PR has:
- [√ ] been self-reviewed.
- [√ ] concurrent read
- [√ ] concurrent write
- [√ ] concurrent read and write
- [√ ] added documentation for new or modified features or behaviors.
- [√ ] added Javadocs for most classes and all non-trivial methods.
- [√ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
- [√ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage.
- [√ ] added integration tests.
- [ ] been tested in a test IoTDB cluster.
Key changed/added classes (or packages if there are too many classes) in this PR
PlanExecutor LocalSchemaProcessor SchemaRegionSchemaFileImpl DataRegion org.apache.iotdb.db.engine.alter IoTDBSqlParser.g4 CrossSpaceCompactionTask.java SingleSeriesCompactionExecutor.java InnerSpaceCompactionTask.java TsFileManager MeasurementMNode
Phase III thinking modification code thinking:
- Modify the SCHEMA command-remove the rewriting code
- Modify the SCHEMA command-rewriting the log (here you only need to record the modification command record and the overwriting complete record)
- Modify the SCHEMA command-increase the rewriting memory record
- Modify the SCHEMA command-remove the original rewriting lock and use the new rewriting lock (only control the modification and finishing process, not the merger)
- Restore code-Remove all
- Restore code-increase the memory record through the log
- Consolidation code (inside and outside) -On the original rewriting lock
- Merge code (inside and outside) -writer's SCHEMA acquisition conditions: There is modification-> modification sequence-> query schema, otherwise use the file in the file
- Sort the command-file scan-whether there are modified sequences in the file, and the sequence coding has a list of changes.
- Sort out command-new rewriting lock control
- Sort out command-virtual storage group rewriting tasks
- Sort the command-Single file rewriting code modification
- Sort the command-alignment optimization