[DO NOT MERGE] Sensitive database column discusssion
Summary of the problem
@KiKoS0 and I generated a possible yaml file with database obfuscation decisions (currently targeting https://github.com/Qovery/Replibyte) in hackmd, but this representation was not conducive to discussion. So I'm moving that to annotations on db/schema.rb, so we can discuss via the Github PR UI. Currently this is only considering built in transformers https://www.replibyte.com/docs/transformers, but we can write custom ones if necessary
The second commit here enables parsing db/schema.rb to generate the yaml we need to run replibyte. The script is
require 'skeema/ripper'
schema = Skeema::Ripper.parse('db/schema.rb')
tables = schema.keys
replibyte_tables = tables.filter do |table|
schema[table].any? do |column_entry|
hash = column_entry.values.first
hash&.key?("replibyte:")
end
end
replibyte_tables.each do |table|
puts "table: #{table}"
puts "columns:"
schema[table].each do |column_entry|
column_name = column_entry.keys.first
hash = column_entry.values.first
if hash&.key?("replibyte:")
puts " - name: #{column_name}"
puts " transformer_name: #{hash["replibyte:"]}"
end
end
puts ""
end
Alternatively, we may consider using https://github.com/GreenmaskIO/greenmask or other tools, but either way the decision on which columns to redact/censor is tool agnostic.