hcb icon indicating copy to clipboard operation
hcb copied to clipboard

[DO NOT MERGE] Sensitive database column discusssion

Open albertchae opened this issue 1 year ago • 0 comments

Summary of the problem

@KiKoS0 and I generated a possible yaml file with database obfuscation decisions (currently targeting https://github.com/Qovery/Replibyte) in hackmd, but this representation was not conducive to discussion. So I'm moving that to annotations on db/schema.rb, so we can discuss via the Github PR UI. Currently this is only considering built in transformers https://www.replibyte.com/docs/transformers, but we can write custom ones if necessary

The second commit here enables parsing db/schema.rb to generate the yaml we need to run replibyte. The script is

require 'skeema/ripper'
schema = Skeema::Ripper.parse('db/schema.rb')

tables = schema.keys

replibyte_tables = tables.filter do |table|
  schema[table].any? do |column_entry|
    hash = column_entry.values.first
    hash&.key?("replibyte:")
  end
end

replibyte_tables.each do |table|
  puts "table: #{table}"
  puts "columns:"
  schema[table].each do |column_entry|
    column_name = column_entry.keys.first
    hash = column_entry.values.first
    if hash&.key?("replibyte:")
      puts "  - name: #{column_name}"
      puts "    transformer_name: #{hash["replibyte:"]}"
    end
  end
  puts ""
end

Alternatively, we may consider using https://github.com/GreenmaskIO/greenmask or other tools, but either way the decision on which columns to redact/censor is tool agnostic.

albertchae avatar Mar 20 '24 13:03 albertchae