manageiq-api icon indicating copy to clipboard operation
manageiq-api copied to clipboard

Changing things via the server/region/zone settings endpoint might break the format of the settings

Open skateman opened this issue 4 years ago • 7 comments

Basically a format conversion problem, the settings are coming from YAML but presented as JSON are bringing in some conversion artifacts that are described here.

@miq-bot assign @gtanzillo

We should probably discuss this on our next UI/API meeting...

See also https://github.com/ManageIQ/manageiq/issues/17201

skateman avatar Dec 11 '20 12:12 skateman

@kbrock and I looked at config/settings.yml. As an experiment, we did the following:

orig=YAML.load_file("./config/settings.yml")
File.write("./config/settings2.yml", YAML.dump(orig))

s2=YAML.load_file("./config/settings2.yml")
s3=JSON.parse(s2.to_json, :symbolize_names => true)
File.write("./config/settings3.yml", YAML.dump(s3))
(git:kasparov) ~/work/ibm/manageiq$ diff -wb config/settings{,2}.yml
55c55
<     :reindex_schedule: "1 * * * *"
---
>     :reindex_schedule: 1 * * * *
59c59
<     :vacuum_schedule: "0 2 * * 6"
---
>     :vacuum_schedule: 0 2 * * 6
98d97
<
104,107d102
< # provider specific settings are nested here, but they are in the provider repos
< # e.g.:
< #  :ems_<provider_name>:
< #    :use_feature: false
1052c1047
<     :retry_interval: 15 # in seconds
---
>     :retry_interval: 15
1173c1168
<       :chargeback_generation_time_utc: 01:00:00
---
>       :chargeback_generation_time_utc: 3600

Without ignoring whitespace, there were about 30 other differences.

diff config/settings{2,3}.yml
985c985
<       :name: :used_swap_percent_gt_value
---
>       :name: used_swap_percent_gt_value
991c991
<       :name: :used_swap_percent_lt_value
---
>       :name: used_swap_percent_lt_value
1086c1086
<       :poll_method: :normal
---
>       :poll_method: normal
1116,1117c1116,1117
<         :dequeue_method: :drb
<         :poll_method: :normal
---
>         :dequeue_method: drb
>         :poll_method: normal
1123c1123
<           :poll_method: :escalate
---
>           :poll_method: escalate
1129c1129
<         :poll_method: :escalate
---
>         :poll_method: escalate
1136c1136
<           :poll_method: :normal
---
>           :poll_method: normal
1138c1138
<           :dequeue_method: :sql
---
>           :dequeue_method: SQL

Note: Here the main differences are due to values that were originally symbols are now strings and whitespaces.

gtanzillo avatar Dec 16 '20 21:12 gtanzillo

Process

Our settings is stored in yaml and edited in json:

where data format note
fs or db yaml
ruby ruby objects converted from yaml
ruby json converted to json, prepping for send
browser json user edits
ruby json sent back up
ruby ruby objects converted from json
ruby yaml converted from ruby objects
db yaml deltas stored

yaml and yaml2 are not the same, even when no edits are made. This is why we are having this discussion.

Where changes occurred

We looked at the 2 serialization processes:

  1. yaml => ruby => yaml
  2. ruby => json => ruby

I ran GT's script on settings.yml for all the repos. I am unsure whether there are other files that we use for populating settings, but this looked pretty complete to me. We learned that both of the serialization processes changed the values.

I feel the original note conflated the two issues. If yaml can't do a round trip, it is not fair to blame this on json, especially when the original yaml file was not properly formatted. Json does introduce issues, but the number of cases is minimal.

from ruby to json back to ruby objects

Changes introduced:

  • regular expressions got screwed up
    • nuage (4), azure (9), redfish (1)
  • symbols values are converted to strings
    • manageiq (9)

These are both native ruby types, so it makes sense that javascript centric json would not know how to handle them.

from yaml to ruby back to yaml

Changes introduced:

  • comments removed
    • azure, nuage, amazon, manageiq
  • indentation fixed
    • azure, openshift, api, nuage, amazon, openstack,
  • quotes changed from " to '
    • azure, openstack
  • quotes removed
    • lenovo, openstack, manageiq (cron strings), ovirt
  • comments were removed
    • azure, manageiq, (others too - sorry dropped the ball on this one)
  • _ in numerical values were removed
    • openshift, api
  • regular expression back slash fixed (e.g.: \A => \\A)
    • openstack, ovirt
  • blank lines removed
    • redfish
  • time convrted to number of seconds (01:00:00 => 3600)
    • manageiq

Of note, if I ran the serialization and deserialization process on the output file to another output file, it ended up with the same file. so at least it is deterministic. All hashes and arrays were in the same order.

These changes do not loose data but do alter it. Most of these changes are fixing errors in the original yaml files.

Proposal

We have a few ways to solve this:

  • keep everything as yaml and have the javascript handle serialization/deserialization
  • chang the settings so they are json compliant.
  • change the way we serialize the json to not be lossy with the data types we do use.

I propose the second option:

  • reformat yml files to be valid
  • use strings instead of symbols
  • use strings instead of regular expressions

verbose:

Modify all settings files so the yaml deserialization and serialization process produces the same output as the original input. This is basically done by running the yaml only portion of the script above and replacing the original scripts. I am not happy about loosing the comments but our current yaml process does loose them, and all the other changes are fixing indentation and escaping issues.

Modify manageiq in a few spots to handle string or symbolic values. It is possible that the code will work unchanged. this will be backwards compatible, but we will change the settings files to use the string instead.

Modifying plugins to convert strings to regular expressions where they are used (i.e.: Regexp.new(value)). This is the biggest change but it is possible that all plugins already use only a single block of common manageiq core code.

We need to ensure there are no other settings files are also used. We need to perform a potentially slow migration to convert yaml stored in the database to comply with these changes.

@gtanzillo I did not see the SQL change that you saw. Maybe my files are outdated or something strange is happengin in terminal encoding.

kbrock avatar Dec 16 '20 23:12 kbrock

Notes from @gtanzillo and @kbrock:

  • yaml to yaml being the same is not necessary. we care about ruby objects comparison to ruby objects.
  • json is the only one we need to have serialize and deserialize produce the same value.
  • we store regexp as strings and as regular expressions (ruby objects) - it is only the places we store them as regular expressions that are an issue.
  • the yaml changes to regular expressions were only an issue when the string version of regular expressions is stored.
  • keenan hypothesizes that converting the symbolized values to strings may just work
  • It looks like all the regular expression objects are used for the same purpose and are processed in core and would need one line of code to fix
  • changes to our code for both problems are backwards compatible and can be made before we actually decide to change the value stored in the database

kbrock avatar Dec 17 '20 21:12 kbrock

update:

  • [x] support strings for event detection instead of regexs ManageIQ/manageiq#20907
  • [x] support strings instead of symbols for settings values ManageIQ/manageiq#20908
  • [x] change symbols to strings in settings ManageIQ/manageiq#20908
  • [ ] Validate strings
  • [ ] migrate settings data to have strings in the database again w/ 20908
  • [x] remove regex from nuage https://github.com/ManageIQ/manageiq-providers-nuage/pull/237
  • [x] remove regex from azure https://github.com/ManageIQ/manageiq-providers-azure_stack/pull/54
  • [x] remove regex from redfish https://github.com/ManageIQ/manageiq-providers-redfish/pull/133

kbrock avatar Dec 27 '20 04:12 kbrock

Moving forward, I wonder if we should change our settings.yml to settings.json... This way we can't introduce non-JSON stuff.

Fryguy avatar Jan 04 '21 19:01 Fryguy

@kbrock any update on this?

skateman avatar Feb 08 '21 15:02 skateman

This issue has been automatically marked as stale because it has not been updated for at least 3 months.

If you can still reproduce this issue on the current release or on master, please reply with all of the information you have about it in order to keep the issue open.

Thank you for all your contributions! More information about the ManageIQ triage process can be found in the triage process documentation.

miq-bot avatar Feb 27 '23 00:02 miq-bot