masking
masking copied to clipboard
Command line tool for generating anonymized database for MySQL/MariaDB
MasKING🤴
The command line tool for anonymizing database records by parsing a SQL dump file and build a new SQL dump file with masking sensitive/credential data.
Installation
gem install masking
Requirement
- Ruby 2.5/2.6/2.7/3.0(preview)
Supported RDBMS
- MySQL: 5.51, 5.6, 5.7, 8.0
- MariaDB: 5.5, 10.02, 10.1, 10.2, 10.3, 10.4
Usage
-
Setup configuration for anonymizing target tables/columns to
masking.yml
# table_name: # column_name: masked_value users: string: anonymized string email: anonymized+%{n}@example.com # %{n} will be replaced with sequential number integer: 12345 float: 123.45 boolean: true null_column: null date: 2018-08-24 time: 2018-08-24 15:54:06 binary_or_blob: !binary | # Binary Data Language-Independent Type for YAML™ Version 1.1: http://yaml.org/type/binary.html R0lGODlhDAAMAIQAAP//9/X17unp5WZmZgAAAOfn515eXvPz7Y6OjuDg4J+fn5 OTk6enp56enmlpaWNjY6Ojo4SEhP/++f/++f/++f/++f/++f/++f/++f/++f/+ +f/++f/++f/++f/++f/++SH+Dk1hZGUgd2l0aCBHSU1QACwAAAAADAAMAAAFLC AgjoEwnuNAFOhpEMTRiggcz4BNJHrv/zCFcLiwMWYNG84BwwEeECcgggoBADs=
A value will be implicitly converted to a compatible type. If you prefer to explicitly convert, you could use a tag as defined in YAML Version 1.1
not-date: !!str 2002-04-28
String should be matched with MySQL String Type. Integer/Float should be matched with MySQL Numeric Type. Date/Time should be matched with MySQL Date and Time Type.
NOTE: MasKING doesn't check actual schema's type from the dump. If you put incompatible value it will cause an error during restoring to the database.
-
Dump database with anonymizing
MasKING works with
mysqldump --complete-insert
mysqldump --complete-insert -u USERNAME DATABASE_NAME | masking > anonymized_dump.sql
-
Restore from the anonymized dump file
mysql -u USERNAME ANONYMIZED_DATABASE_NAME < anonymized_dump.sql
Tip: If you don't need to have an anonymized dump file, you can directly insert it from the stream. It can be faster because it has less IO interaction.
mysqldump --complete-insert -u USERNAME DATABASE_NAME | masking | mysql -u USERNAME ANONYMIZED_DATABASE_NAME
options
$ masking -h
Usage: masking [options]
-c, --config=FILE_PATH specify config file. default: masking.yml
-v, --version version
Use case of anonymized (production) database
-
Analyzing production databases for BI, Machine Learning, troubleshooting with respecting GDPR
-
Stress test / Integration test
-
Performance optimization for slow query
The analyzing slow query often needs a similar amount of records/cardinality with production, the anonymized database help to analyze and tune the slow query.
-
Simulating database migration
Some schema migration locks table and it causes trouble during the execution. With a smaller amount of database, the migration will finish in a short time and easy to overlook the problem. With the anonymized production database, it is easy to simulate the migration as the real release and makes it easy to find the problem.
-
Better feature development flow
Using similar data with the production database makes better development experience. It makes easy to find out the things which should be changed/fixed. Also, some bugs are related to unexpected data in production, it makes easy to find them too.
-
And… your idea here!
Development
git clone [email protected]:kibitan/masking.git
bin/setup
You can also run bin/console
for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install
.
boot
bundle exec exe/masking
Run test & rubocop & notes
bundle exec rake
acceptance test
./acceptance/run_test.sh
available option via environment variable:
-
MYSQL_HOST
: database host(default:localhost
) -
MYSQL_USER
: mysql user name(default:mysqluser
} -
MYSQL_PASSWORD
: password for user(default:password
) -
MYSQL_DBNAME
: database name(default:mydb
)
with docker-compose
docker-compose -f docker-compose.yml -f docker-compose/mysql80.yml run -e MYSQL_HOST=mysql80 app acceptance/run_test.sh
or
docker-compose/acceptance_test.sh mysql80
The docker-compose file names for other database versions, specify that file.
- MySQL 8.0:
docker-compose/mysql80.yml
- MySQL 5.7:
docker-compose/mysql57.yml
- MySQL 5.6:
docker-compose/mysql56.yml
- MySQL 5.51:
docker-compose/mysql55.yml
- MariaDB 10.4:
docker-compose/mariadb104.yml
- MariaDB 10.3:
docker-compose/mariadb103.yml
- MariaDB 10.2:
docker-compose/mariadb102.yml
- MariaDB 10.1:
docker-compose/mariadb101.yml
- MariaDB 10.02:
docker-compose/mariadb100.yml
- MariaDB 5.5:
docker-compose/mariadb55.yml
Markdown lint
bundle exec mdl *.md
Development with Docker
docker build . -t masking
echo "sample stdout" | docker run -i masking
docker run masking -v
Profiling
use bin/masking_profile
$ cat your_sample.sql | bin/masking_profile
flat result is saved at /your/repo/profile/flat.txt
graph result is saved at /your/repo/profile/graph.txt
graph html is saved at /your/repo/profile/graph.html
$ open profile/flat.txt
see also: ruby-prof/ruby-prof: ruby-prof: a code profiler for MRI rubies
Benchmark
use benchmark/run.rb
$ benchmark/run.rb
user system total real
1.103012 0.009460 1.112472 ( 1.123093)
Design Concept
KISS ~ keep it simple, stupid ~
No connection to the database, No handling files, Only dealing with stdin/stdout. ~ Do One Thing and Do It Well ~
No External Dependency
Depend on only pure language standard libraries, no external libraries
Future Todo
- Pluggable/customizable for a mask way e.g. integrate with Faker
- Compatible with other RDBMS e.g. PostgreSQL, Oracle, SQL Server
- Parse the schema type information and validate target columns value
- Performance optimization
- Write in the streaming process
- rewrite by another language?
- Well-documentation
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/kibitan/masking. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.
License
The gem is available as open source under the terms of the MIT License.
Code of Conduct
Everyone interacting in the Masking project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.
1: MySQL 5.5 is already not supported by official
2: MariaDB 10.0 is already not supported by official