speedb
speedb copied to clipboard
Log Improvement: Options for only the first 10 column families are reported to the log
The options of column families are reported to the log at the top of every log file. However, if there are more than 10 column families (not very common but definitely allowed and occurs in practice), only the options of the first 10 are reported to the log. Throughout the log file, any other log line that is associated with any column family will be reported. So, you find in the log information about column families whose options you don't know.
@udi-speedb, do you know if these options are printed to the OPTIONS file?
also, i dont know if this is a bug since its definitely intentional
@Yuval-Ariel:
- I do not know if its in the options. I assume it is.
- I agree that it's intentional , but I still think it should be considered a bug.
- I believe that a log file should allow a person to see all the information that a log file provides. In addition, I might have access only to to a log file (e.g., log parsing tool) and should be able to use it only to parse and process. You will see in the log events, stats, etc. related to the "missing" column families, but no options for them
@Yuval-Ariel:
- I do not know if its in the options. I assume it is.
- I agree that it's intentional , but I still think it should be considered a bug.
- I believe that a log file should allow a person to see all the information that a log file provides. In addition, I might have access only to to a log file (e.g., log parsing tool) and should be able to use it only to parse and process. You will see in the log events, stats, etc. related to the "missing" column families, but no options for them
I was working on something earlier that would only print/return options that were different than the default. This could be useful if we wanted to keep the logs (or options files) shorter and pruned. I can try to resurrect that code...
@Yuval-Ariel:
- I do not know if its in the options. I assume it is.
- I agree that it's intentional , but I still think it should be considered a bug.
- I believe that a log file should allow a person to see all the information that a log file provides. In addition, I might have access only to to a log file (e.g., log parsing tool) and should be able to use it only to parse and process. You will see in the log events, stats, etc. related to the "missing" column families, but no options for them
I was working on something earlier that would only print/return options that were different than the default. This could be useful if we wanted to keep the logs (or options files) shorter and pruned. I can try to resurrect that code...
@mrambacher As part of the log parser tool, I am displaying a diff between baseline options files (options files that are generated from official RocksDB / Speedb releases whose values are the defaults for that release) and options as displayed in the log file.
Until this issue is resolved, I have added https://github.com/speedb-io/speedb/issues/520
I am not sure if reporting the options for all of the cf-s is a valid solution when there are many cf-s. My concerns with reporting all of the options is that, when there are many cf-s (their number is not limited), we may bloat the log file with the text reporting the options of all of the cf-s. This may be a bigger issue when log files are rotated frequently, as the options are reported at the top of every rolled log.
thats why @mrambacher suggestion of reporting only the options that are different than the first cf is a great one. i believe doing this is irrelevant of the log-parser and it would have several beneficial effects:
- reduce confusion since it would immediately popup an option thats different
- reduce writing to the log
- allow for all the cfs options to be printed to the log - what this issue is all about
I agree. I think we should go the log parser's way which is:
- Display the options common to all cf-s once.
- Display only the diff per cf
@mrambacher - Please attach a sample log output when you have one ready, so we would be able to better understand how that would look (and also estimate the effort of the log parser's adaptation).
@mrambacher - Could you please add a reference for the pr-s on which you rely as infrastructure for this one?
This is being resolved in stages that will require several PRs:
- #619 changes the serialize methods to use Properties/Maps instead of strings. This allows later formatting to be implemeneted
- #648 allows only options that were changed to be part of the serialization. This allows the output written to the Dump to be shorter and only contain the pertinent information, thereby shrinking the size of the LOG.
- #651 adds a pluggable formatter that allows options to be serialized in different formats (such as that written to the LOG)
- #719 changes the Options::Dump to use the Options internal code and not hard-coded values. This insures that all options are logged appropriately (as new ones are added)
There will also be a subsequent PR that brings this altogether and removes the cap of the number of CFs that are written.