ideas icon indicating copy to clipboard operation
ideas copied to clipboard

Add support for database sharding

Open MatzeKitt opened this issue 2 years ago • 5 comments

Feature Request

Describe your use case and the problem you are facing

We’re using database sharding along with HyperDB and face some problems when it comes to use WP-CLI. Especially, but not exclusively, it’s hard until impossible to use the wp db commands. E.g. running wp db tables results in the error mentioned in https://github.com/wp-cli/wp-cli/issues/4670, and also the workarounds there don’t work with sharding, since WP-CLI doesn’t know the shards here.

We’re running a pretty normal HyperDB configuration with the following db-config.php:

<?php
// Exit if accessed directly
defined( 'ABSPATH' ) || exit;

$wpdb->charset = 'utf8mb4';
$wpdb->collate = 'utf8mb4_unicode_ci';
$wpdb->save_queries = false;
$wpdb->persistent = false;
$wpdb->max_connections = 10;
$wpdb->check_tcp_responsiveness = true;

// Add global database
$wpdb->add_database( [
	'host' => DB_HOST,
	'user' => DB_USER,
	'password' => DB_PASSWORD,
	'name' => DB_NAME,
	'write' => 1,
	'read' => 2,
	'dataset' => 'global',
	'timeout' => .5,
] );

// Add 16 shards for blog databases
foreach ( array_merge( range( 0, 9 ), range( 'a', 'f' ) ) as $v ) {
	$wpdb->add_database( [
		'host' => DB_HOST,
		'user' => DB_USER,
		'password' => DB_PASSWORD,
		'name' => DB_NAME.'_'.$v,
		'write' => 1,
		'read' => 2,
		'dataset' => $v,
		'timeout' => .5,
	] );
}

$wpdb->add_callback( 'get_blog_shard' );

function get_blog_shard( $query, $wpdb ) {
	if ( preg_match( "/^{$wpdb->base_prefix}\d+_/i", $wpdb->table ) ) {
		return substr( md5( $wpdb->blogid ), 0, 1 );
	}
}

function wpdb_connection_error( $host, $port, $op, $tbl, $ds, $dbhname, $wpdb ) {
	dead_db();
}

$wpdb->add_callback( 'wpdb_connection_error', 'db_connection_error' );

Describe the solution you'd like

I would like to get support for the sharding callback added to $wpdb in order to determine the correct database tables just by specifying the --url parameter in a WP-CLI request.

MatzeKitt avatar Oct 16 '23 07:10 MatzeKitt

Sounds like it would be a great improvement!

Feel free to submit a pull request, if you'd like. Here is some guidance on our pull request best practices.

danielbachhuber avatar Oct 16 '23 12:10 danielbachhuber

I’ll try 😅

MatzeKitt avatar Oct 16 '23 12:10 MatzeKitt

Before creating a pull request, I want to make sure whether my current approach is valid at all in my current working version. The current version is limited due to the fact that I only tested one command: vendor/bin/wp db search 'string' --network

First of all, in Utils\wp_get_table_names() I need to get the tables from information_schema.tables, since HyperDB doesn’t store any information about the tables. $wpdb->dbname is empty in this case.

		if ( empty( $wpdb->dbname ) && $wpdb instanceof \HyperDB ) {
			$mode = str_ends_with( $wpdb->dbhname, '__r' ) ? 'read' : 'write';
			$tables = [];

			foreach ( $wpdb->hyper_servers as $hyper_server ) {
				if ( empty( $hyper_server[ $mode ] ) ) {
					continue;
				}

				foreach ( $hyper_server[ $mode ] as $server ) {
					foreach ( $server as $server_data ) {
						$databases[] = $server_data['name'];
					}
				}
			}

			$table_schemas = $wpdb->get_results( sprintf( "SELECT table_schema as db, table_name as table_name FROM information_schema.tables WHERE table_schema IN ('%s')", implode( "', '", $wpdb->_escape( $databases ) ) ) );

			foreach ( $table_schemas as $table_schema ) {
				$tables[] = $table_schema->db . '.' . $table_schema->table_name;
			}
		} else {
			// Note: BC change 1.5.0, tables are sorted (via TABLES view).
			// phpcs:ignore WordPress.DB.PreparedSQL.NotPrepared -- uses esc_sql_ident() and $wpdb->_escape().
			$tables = $wpdb->get_col( sprintf( "SHOW TABLES WHERE %s IN ('%s')", esc_sql_ident( 'Tables_in_' . $wpdb->dbname ), implode( "', '", $wpdb->_escape( $wp_tables ) ) ) );
		}

The else is the current implementation. For our HyperDB version with database shardings, the result is a list of table names with database prefix: db.table, e.g. wordpress.wp_options if the database is called wordpress.

Now, since the table name is escaped using DB_Command::esc_sql_ident() in DB_Command::search() as well as DB_Command::get_columns(), I need to adjust that to properly escape the database and the table name:

		if ( str_contains( $table, '.' ) ) {
			list( $db, $table ) = explode( '.', $table );
			$table_sql          = self::esc_sql_ident( $db ) . '.' . self::esc_sql_ident( $table );
		} else {
			$table_sql = self::esc_sql_ident( $table );
		}

This way it works, but it feels a little bit hacky to me. What do you think about it? We can also discuss it via Slack, my username there is KittMedia.

MatzeKitt avatar Jan 19 '24 10:01 MatzeKitt

How many total tables do you have? And what's the particular use case you're trying to solve for?

A couple of thoughts:

  • We should probably create a holistic solution for HyperDB support, instead of adding support to each command on a one-off basis.
  • We'll probably want to make sure the solution gracefully accommodates the scenario where you have thousands of database tables.

danielbachhuber avatar Feb 11 '24 15:02 danielbachhuber

We have currently around 1.500 sites in a multisite in 16 different shards, so roughly 16.500 tables. The problem I try to solve is that any global db operation fails since WP-CLI is not aware of all the tables.

If you can suggest a holistic solution, I would love to hear it so that I can try to implement it. 🙂

MatzeKitt avatar Feb 17 '24 17:02 MatzeKitt