firebird icon indicating copy to clipboard operation
firebird copied to clipboard

gbak sometimes could cause corruption in source database during backup...

Open EPluribusUnum opened this issue 3 years ago • 10 comments

... when database is in use and gbak uses local protocol. We see this behaviour in case of FB25 and FB30. Most of the corruptions were "wrong page type", but not exclusievly.

When we switched to TCP/IP protocol the corruption went away.

Could you build in a check in gbak? In case of local protocol also check mon$attachments and deny backup when any connection exists.

EPluribusUnum avatar Sep 13 '22 06:09 EPluribusUnum

It is very hard to believe that gbak could really corrupt database. The only case I can think of is when gbak runs in embedded mode (not local protocol, AKA XNET!) with not blocked garbage collection and crashes by some reason (already corrupt db?) not flushing page cache. Even in this case I could expect some orphan pages only, but not "wrong page type" errors.

gbak could "detect" already existing corruption as it reads whole DB, thouhg.

If you want to ensure no other connections exists - shutdown database in single user mode, it allows to run backup.

hvlad avatar Sep 13 '22 07:09 hvlad

@hvlad, when runs gbak in embedded mode? When I ment local I meant only file path is defined like: gbak -b -user sysdba -password masterkey /opt/firebird/....fdb /opt/firebird...fbk. Or in this case this is emedded, not local protocol?

(BTW we see this behaviour only in Linux, can't recall a Windows case)

EPluribusUnum avatar Sep 13 '22 07:09 EPluribusUnum

On Linux there is no "true local protocol". And never was. When so called "local connection string" specified by application, client layer uses embedded connection on Linux and all other non-Windows OS when possible or switched to the INET remote protocol (via localhost) otherwise. In most cases it means embedded for CS\SC and INET for SS.

hvlad avatar Sep 13 '22 07:09 hvlad

Is it possible that a custom FIREBIRD_LOCK envvar was specified?

dyemanov avatar Sep 13 '22 07:09 dyemanov

@hvlad , how can I protect the database on Linux? How can I prevent that the user do not run gback with non TCP/IP protocol? @dyemanov , no custom FIREBIRD_LOCK envvar

EPluribusUnum avatar Sep 13 '22 07:09 EPluribusUnum

@hvlad , how can I protect the database on Linux?

Until we know for sure the real reason of corruptions there can't be any specific suggestions.

How can I prevent that the user do not run gback with non TCP/IP protocol?

Perhaps, ON CONNECT trigger that check for app name and protocol could help. Better to use context vars (CLIENT_PROCESS and NETWORK_PROTOCOL from SYSTEM namespace).

But, again, first you need to find the real reason of problem.

hvlad avatar Sep 13 '22 09:09 hvlad

On 9/13/22 10:55, EPluribusUnum wrote:

How can I prevent that the user do not run gback with non TCP/IP protocol?

Use OS-level protection. First of all, gbak should not be run with root privileges. Next, user (OS user) running gbak should not have file-level access to database. That will cause gbak always automatically use TCP connection.

AlexPeshkoff avatar Sep 13 '22 09:09 AlexPeshkoff

@hvlad rdb$get_context('SYSTEM', 'NETWORK_PROTOCOL'), rdb$get_context('SYSTEM', 'CLIENT_PROCESS'), mon$remote_protocol, mon$remote_address, mon$remote_process are all null in this case. (I don't know this is intended, oth this is a bug.)

@AlexPeshkoff thank you.

(We don't operate on customers sites, all suggested protection will be added to our document)

EPluribusUnum avatar Sep 13 '22 10:09 EPluribusUnum

Can it be a kind of memory corruption? Can be gbak built with sanitizers to check for subtle stack/heap overflows?

aafemt avatar Oct 11 '22 08:10 aafemt

In 99.9% such corruption cause segfault, not db corruption.

AlexPeshkoff avatar Oct 11 '22 09:10 AlexPeshkoff