iotdb [IOTDB-3611] Support "Modify Time Series Encoding and Compression Type" interface/command

[IOTDB-3611] Support "Modify Time Series Encoding and Compression Type" interface/command

Open lpf4254302 opened this issue 2 years ago • 1 comments

Description

Target:
Support "Modify Encoding Type and Compression Type" command
Application scenarios:
In IoTDB application projects, reasonable setting of encoding and compression algorithms can effectively reduce disk space occupancy and reduce server costs in disguise. Modifying encoding and compression algorithms is an ideal method.
Example:
The physical quantity root.sg1.device_1.m1, the initial encoding type is PLAIN, and the data characteristics are straight-up data 1,2,3,4,5.....
After a long period of accumulation, the data occupies about 1G of hard disk space. Now after modifying the encoding type to TS_2DIFF, the data occupying the hard disk space can be reduced to less than 100M.

plan selection

About the selection of single/multiple physical quantity modification

Option 1: A single command only modifies the encoding type and compression type of a single physical quantity
Option 2: A single command supports batch modification of physical quantity encoding type and compression type
Conclusion: Option 1 is selected for the first version, and Option 2 is supported after the command verification is passed.

About the selection of the command affecting the data range

Option 1: Affect the newly inserted data after modification
Option 2: Affect all sealed, unsealed and newly inserted data after modification
Conclusion: The first version chooses option 2. After the command is executed successfully, you can immediately see the disk changes

Merge related code changes

Before developing the function, it is necessary to modify the original merged code to adapt to this function
1. When the same physical quantity has different encoding types or compression types in the tsfile file, the merged tsfile will be damaged, and this problem needs to be fixed
2. Since the merge process is a mutually exclusive operation, it is necessary to increase the lock control

Modify the encoding type and compression type positive process

1. Verify the request parameters before execution, including some non-null verification, cluster status verification, etc.
2. Modify the encoding type and compression type in the schema
3. Void the schema cache
4. Find the storage group
5. Perform modification operations by virtual storage group
6. Force close working TsFileProcessors
7. Get rewriteLock
8. Generate alter log
9. Rewrite the ordered and unordered tsfile data operations separately
9.1. (Un)sequenceListByTimePartition
9.2. Traverse tsFileResource
9.3. Filter unexecuted tsFileResource (recovery)
9.4. Generate targetTsFileResource
9.5. Rewrite tsFileResource
9.5.1. Acquire tsFileResource read lock
9.5.2, read device list
9.5.3. startChunkGroup
9.5.4, read chunks
9.5.5. If the measurement is not modified by the target, write the chunk directly
9.5.6. If it is a measurement modified by the target, read pages and points one by one, re-encode and compress them before writing
9.5.7. endChunkGroup
9.5.8. endFile
9.5.9. Release the tsFileResource read lock
9.6. Rename the file to .tsfile->.alter.old .alter->.tsfile
9.7. Replace tsFileResource and targetTsFileResource
9.8. Delete the original tsfile related files (.tsfile .resource .mods)
10. Delete alter log
11. Release rewriteLock

Recovery operation after schema modification

mlog adds AlterTimeSeriesPlan and implements recovery method at the same time

Service restart recovery operation

1. Determine whether RecoverAlter is required before RecoverCompaction
2. Execute recoverAlter before initCompaction

recoverAlter method flow
1. Analyze alter.log to get a list of unfinished tsfiles
2. Check the list of unfinished tsfiles and perform pre-repair operations
3. Rewrite the tsfile operation

Pre-Recovery Action Policy
Incomplete tsfile status:
1. There is no .tsfile
1.1, .alter.old exists and .alter exists - wait for completion
1.2, only .alter.old exists - system exception
1.3, only exists .alter - system exception
2. There is .tsfile
2.1, exists. alter - writing
2.2, exist.alter.old - wait for delete
2.3, does exist - not started

Need to continue to improve the content

1. Aligned time series data rewrite optimization
2. Implementation of RSchemaRegion, SchemaRegionSchemaFileImpl related methods
3. Support for clusters
4. The tsfile rewrite operation is changed to asynchronous execution
5. Support batch modification of physical quantities

This PR has:

[√ ] been self-reviewed.
- [√ ] concurrent read
- [√ ] concurrent write
- [√ ] concurrent read and write
[√ ] added documentation for new or modified features or behaviors.
[√ ] added Javadocs for most classes and all non-trivial methods.
[√ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
[√ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage.
[√ ] added integration tests.
[ ] been tested in a test IoTDB cluster.

Key changed/added classes (or packages if there are too many classes) in this PR

PlanExecutor LocalSchemaProcessor SchemaRegionSchemaFileImpl DataRegion org.apache.iotdb.db.engine.alter IoTDBSqlParser.g4 CrossSpaceCompactionTask.java SingleSeriesCompactionExecutor.java InnerSpaceCompactionTask.java TsFileManager MeasurementMNode

Aug 03 '22 08:08 lpf4254302

Phase III thinking modification code thinking:

Modify the SCHEMA command-remove the rewriting code
Modify the SCHEMA command-rewriting the log (here you only need to record the modification command record and the overwriting complete record)
Modify the SCHEMA command-increase the rewriting memory record
Modify the SCHEMA command-remove the original rewriting lock and use the new rewriting lock (only control the modification and finishing process, not the merger)
Restore code-Remove all
Restore code-increase the memory record through the log
Consolidation code (inside and outside) -On the original rewriting lock
Merge code (inside and outside) -writer's SCHEMA acquisition conditions: There is modification-> modification sequence-> query schema, otherwise use the file in the file
Sort the command-file scan-whether there are modified sequences in the file, and the sequence coding has a list of changes.
Sort out command-new rewriting lock control
Sort out command-virtual storage group rewriting tasks
Sort the command-Single file rewriting code modification
Sort the command-alignment optimization

Aug 22 '22 06:08 lpf4254302

iotdb iotdb copied to clipboard

[IOTDB-3611] Support "Modify Time Series Encoding and Compression Type" interface/command

Description

plan selection

About the selection of single/multiple physical quantity modification

About the selection of the command affecting the data range

Merge related code changes

Modify the encoding type and compression type positive process

Recovery operation after schema modification

Service restart recovery operation

Need to continue to improve the content

Key changed/added classes (or packages if there are too many classes) in this PR

iotdb
iotdb copied to clipboard