diskquota icon indicating copy to clipboard operation
diskquota copied to clipboard

Error handling for disqkuota worker startup stage.

Open bimboterminator1 opened this issue 1 year ago • 2 comments

During diskquota worker's first run the initial set of active tables with their sizes is being loaded from diskquota.table_size table in order to warm up diskquota rejectmap and other shared memory objects. If an error occurs during this initialization process, the error will be ignored in PG_CATCH() block. Such ignorance can be potentially harmful and can lead to undesired behaviour of the whole extension. For example, if an error ocurs during initialization, local_active_table_stat_map will not be filled properly. And at the next loop iteration, tables, that are not in acitive table list will be marked as irrelevant and to be deleted both from table_size_map and table_size table in flush_to_table_size function. This situation produces extra perfomance load, which is not guaranteed to be safe.

This commit proposes the handling of the initialization errors, which occur during worker's first run. In the DiskquotaDBEntry structure the bool variable corrupted is added in order to indicate, that the worker wasn't able to initialize itself on given database. And DiskquotaDBEntry also is now passed to refresh_disk_quota_model function from worker main loop, because one need to change the state of dbEntry. The state is changed when the refresh_disk_quota_usage function catches an error, which occured during the initialization step, in PG_CATCH() block. And after the error is catched, the corrupted flag is set in given dbEntry, and then the error is rethrown. This leads to worker process termination. The launcher will not be able to start it again, because added flag is set in the database structure, and this flag is being checked inside the disk_quota_launcher_main function. The flag can be reseted by calling resetBackgroundWorkerCorruption function, which is currently called in SIGHUP handler.

This patch does not contain a proper test because the author couldn't find optimal architecture for the test. The behaviour of the patch can be testes either via putting fault injector or calling an error, for example, in the gp_fetch_active_tables function. The presence of diskquota bgworker process can be monitored via ps or pg_stat_activity for given database.

bimboterminator1 avatar Jul 14 '23 07:07 bimboterminator1