firebird icon indicating copy to clipboard operation
firebird copied to clipboard

Error: database file appears corrupted after restore from backup (FB5, RC2)

Open gsbelarus opened this issue 1 year ago • 36 comments

There are multiple Firebird instances on the server: FB25, FB3, and FB5. Each instance is assigned a dedicated port and an appropriate service name. When using a third-party application to perform a restore, it connects to the server via a TCP connection using the connection string localhost/3056:some_path_to_database.

At the end of the restoration process, the following error message appears:

Unable to complete network request to host "localhost". Error reading data from the connection.

Additionally, there is a record in the firebird.log file:

XNET error: XNET server initialization failed. Probably, another instance of the server is already running.

The resulting database file appears to be corrupted, and subsequent gfix -v -full ... shows:

Number of record level errors : 18722

gsbelarus avatar Jan 02 '24 10:01 gsbelarus

XNET error: XNET server initialization failed. Probably, another instance of the server is already running.

To simultaneously run several servers using XNET, you need to configure the IpcName (firebird.conf) in the same way as you configure ports for different FB instances to work via INET.

sim1984 avatar Jan 02 '24 16:01 sim1984

I used the connection string localhost/3056:path_to_database. Through all my life, I believed this is network connection. Am I wrong?

gsbelarus avatar Jan 02 '24 16:01 gsbelarus

Moreover, I checked the firebird.conf. There is no section dedicated to XNET settings exists. Where and what should I assign?

gsbelarus avatar Jan 02 '24 16:01 gsbelarus

It's IPCNAME parameter that makes it possible to run multiple servers, listening xnet, in parallel. But it's related only to message in the log (XNET error: XNET server initialization failed. Probably, another instance of the server is already running.). What you mention first looks like server crash at the first look. Does it happen for all versions or only one particular?

AlexPeshkoff avatar Jan 02 '24 16:01 AlexPeshkoff

It's IPCNAME parameter that makes it possible to run multiple servers, listening xnet, in parallel. But it's related only to message in the log (XNET error: XNET server initialization failed. Probably, another instance of the server is already running.). What you mention first looks like server crash at the first look. Does it happen for all versions or only one particular?

I got another backup and now doing restoration. Will come with the result in a few hours.

As for XNET, is it enough if I set IPCNAME to FIREBIRD5 for Firebird 5?

gsbelarus avatar Jan 02 '24 16:01 gsbelarus

On 1/2/24 19:57, Andrej Kirejeŭ wrote:

As for XNET, is it enough if I set IPCNAME to FIREBIRD5 for Firebird 5?

Enough to avoid mentioned message in the log.

AlexPeshkoff avatar Jan 02 '24 17:01 AlexPeshkoff

On 1/2/24 19:57, Andrej Kirejeŭ wrote: As for XNET, is it enough if I set IPCNAME to FIREBIRD5 for Firebird 5? Enough to avoid mentioned message in the log.

as it is a common scenario when multiple versions of Firebird co-exists on the same server, it would be useful to set the default IPCNAME to FIREBIRD5 for Firebird v.5, FIREBIRD6 for the next version etc

gsbelarus avatar Jan 02 '24 18:01 gsbelarus

sorry, bug confirmed. another backup -- the same result:

Unable to complete network request to host "localhost". Error reading data from the connection.

the message during the restore process and then:

Number of record level errors : 18722

during the gfix check.

gsbelarus avatar Jan 02 '24 22:01 gsbelarus

There are multiple Firebird instances on the server: FB25, FB3, and FB5. Each instance is assigned a dedicated port and an appropriate service name. When using a third-party application to perform a restore, it connects to the server via a TCP connection using the connection string localhost/3056:some_path_to_database.

What server version is supposed to perform restore ? Did that application run restore using Services API ? If, yes, how it is attached to the services manager and why it is used "localhost... " in target database name ? What happens if perform restore with gbak (or fbsvcmgr) using the same set of params ?

hvlad avatar Jan 02 '24 22:01 hvlad

What server version is supposed to perform restore ?

FB5 5.0.0.1304

Did that application run restore using Services API ?

yes

If, yes, how it is attached to the services manager

through IBX component:

https://github.com/GoldenSoftwareLtd/gedemin/blob/master/Gedemin/IBX/IBServices.pas

it has been doing this way for last 25 years ))

and why it is used "localhost... " in target database name ?

The Presence of localhost in the address string forced IBX to use TCP protocol flag in its components. We definitely needed it in the times of Yaffil, Firebird 0.9 or even earlier.

BTW, localhost/3056 allows us to call exactly FB5 server. As we have FB3, FB25 also running on this machine we could distinguish between them only by port number.

What happens if perform restore with gbak (or fbsvcmgr) using the same set of params ?

Need to check.

gsbelarus avatar Jan 03 '24 09:01 gsbelarus

When run restore with Services API, one should use server name in connection string, not in the target database name.

When speak about IBX, it means that server (remote host) name should be specified in ServerName property, not in DatabaseName property.

In your case localhost/3056:service_mgr is connection string and some_path_to_database (without server host name!) is target database name. If one use plain 'service_mgr' (empty string at ServerName property) it means connection uses XNET or, if it was not successful, fbclient will try 'localhost' without port number. As we have no idea what server version was started first and run XNET listener, we could assume it was not FB5 (message in firebird.log confirms it), I.e. another server instance run restore service and connects to the FB5 due to remote part in database name argument (DatabaseName property). Of course, it doesn't make whole restore process as fast as it should be.

In short: you should check and correct values of ServerName and DatabaseName properties.

BTW, what is last message before error ?

hvlad avatar Jan 03 '24 09:01 hvlad

This is how our UI organized. we specify the target server and path to the database in one string, then divide it programmatically and set the corresponding IBX component's properties. The connection definitely goes to the FB5 server because only FB5 process shows activity during the restore process.

Before the error message, there are a couple of warnings of unknown UDF functions. These are our functions like BIN_AND and BIN_OR. They duplicate built-in functions. There is no need for them right now, just leftovers in the database we didn't clean up yet. Nevertheless, these warnings were present for years and never resulted into database corruption.

gsbelarus avatar Jan 03 '24 09:01 gsbelarus

Could you be more specific and show service properties used ? There is some confusion that better to be cleared up. Also, did you investigated if server was crashed (as Alex suppose) ?

hvlad avatar Jan 03 '24 10:01 hvlad

      IBConfigService.ServerName := edServer.Text;

      if IBConfigService.ServerName > '' then
        IBConfigService.Protocol := TCP
      else
        IBConfigService.Protocol := Local;

in our case edServer.text contains localhost/3056

It would be easier to check whether the server crashed or not if appropriate records of server starting and server properly shutting down were put into the firebird.log file.

gsbelarus avatar Jan 03 '24 10:01 gsbelarus

On 02/01/2024 19:24, Andrej Kirejeŭ wrote:

On 1/2/24 19:57, Andrej Kirejeŭ wrote: As for XNET, is it enough if
I set IPCNAME to FIREBIRD5 for Firebird 5?
Enough to avoid mentioned message in the log.

as it is a common scenario when multiple versions of Firebird co-exists on the same server, it would be useful to set the default IPCNAME to FIREBIRD5 for Firebird v.5, FIREBIRD6 for the next version etc

No, it wouldn't be, because that would mean an older Firebird fbclient.dll wouldn't be able to connect using XNET to a newer Firebird server (or newer client to older server).

Mark

Mark Rotteveel

mrotteveel avatar Jan 03 '24 10:01 mrotteveel

No, it wouldn't be, because that would mean an older Firebird fbclient.dll wouldn't be able to connect using XNET to a newer Firebird server (or newer client to older server). Mark -- Mark Rotteveel

Oh, no. I just changed names to FIREBIRD25, FIREBIRD3, FIREBIRD5... Let us see where it leads now.

But, it won't affect network connections, right?

gsbelarus avatar Jan 03 '24 10:01 gsbelarus

And regarding failed gfix. The firebird.log now contains thousands of records like:

XXXX Tue Jan 2 13:16:27 2024 Database: XXXX Error: Record 3 is wrong length in table GD_EMPLOYEE (1394)

All errors related to one table in the database.

gsbelarus avatar Jan 03 '24 10:01 gsbelarus

Oh, no. I just changed names to FIREBIRD25, FIREBIRD3, FIREBIRD5... Let us see where it leads now.

But, it won't affect network connections, right?

No, only XNET. Keep in mind that for a client to now be able to connect to the server with XNET, the firebird.conf of the client library (i.e. located in the same directory as fbclient.dll used by your application) must also contain the right IpcName setting.

mrotteveel avatar Jan 03 '24 10:01 mrotteveel

      IBConfigService.ServerName := edServer.Text;

      if IBConfigService.ServerName > '' then
        IBConfigService.Protocol := TCP
      else
        IBConfigService.Protocol := Local;

in our case edServer.text contains localhost/3056

Thanks, but...how IBConfigService is related with restore task ? Please, don't add more confusion than we already have. Also, we need to know value of DatabaseName property (at least).

It would be easier to check whether the server crashed or not if appropriate records of server starting and server properly shutting down were put into the firebird.log file.

If Firebird run as a service, look into Windows Event Log. Also, it is good idea to always have WER turned ON to collect crash dumps.

hvlad avatar Jan 03 '24 10:01 hvlad

And regarding failed gfix. The firebird.log now contains thousands of records like:

XXXX Tue Jan 2 13:16:27 2024 Database: XXXX Error: Record 3 is wrong length in table GD_EMPLOYEE (1394)

All errors related to one table in the database.

If restore was not completed like yours, then it is not surprising that gfix shows errors. Here you should rather look at the errors in firebird.log and the log of restore itself (gbak -v -y <log_file>).

You are restoring the database, but where are you restoring it from? What ODS was the backup made from?

sim1984 avatar Jan 03 '24 10:01 sim1984

The database was backed up on FB 3 server and restored on FB 5.

gsbelarus avatar Jan 03 '24 10:01 gsbelarus

Is restore with gbak ok? Can you share core dumps of crashed server?

aafemt avatar Jan 03 '24 10:01 aafemt

There are no suspicious records in the Windows's logs. So, I assume that FB5 service didn't crash.

gsbelarus avatar Jan 03 '24 11:01 gsbelarus

Do error happen when all servers but v5 are stopped?

aafemt avatar Jan 03 '24 11:01 aafemt

Do error happen when all servers but v5 are stopped?

cann't check right now. those servers being used.

gsbelarus avatar Jan 03 '24 11:01 gsbelarus

well, I'm trying to run from the command prompt and the command just hangs. no CPU activity, no records in the firebird.log file.

C:\Program Files\FB5>gbak -r "K:\Bases\xxx.bk" "g:\Bases\xxx.fdb" -user sysdba -pas xxx 
-v -y "G:\Bases\Broiler\restore.log" -z

gsbelarus avatar Jan 03 '24 11:01 gsbelarus

At this point you can attach debugger to it to see what's happening.

aafemt avatar Jan 03 '24 11:01 aafemt

well, I'm trying to run from the command prompt and the command just hangs. no CPU activity, no records in the firebird.log file.

C:\Program Files\FB5>gbak -r "K:\Bases\xxx.bk" "g:\Bases\xxx.fdb" -user sysdba -pas xxx -v -y "G:\Bases\Broiler\restore.log" -z

This command line not uses services and thus not equal to the app case. It should run in embedded mode, did you check CPU usage of whole system or of Firebird process only ? In second case, it is expected to not use CPU.

Could you run quick check with metadata-only restore using services, instead ? Like:

gbak -se localhost/3056:service_mgr -r -m -v "K:\Bases\xxx.bk" "g:\Bases\xxx.fdb"

hvlad avatar Jan 03 '24 11:01 hvlad

restoring of metadata goes without any problems:

gbak:gbak version WI-V5.0.0.1304 Firebird 5.0 RC 2
gbak:use up to 8 parallel workers
gbak:transportable backup -- data in XDR format
gbak:		backup file is compressed
gbak:backup version is 10
gbak:created database g:\Bases\xxx.fdb, page_size 8192 bytes
gbak:started transaction
...
gbak:adjusting views dbkey length
gbak:updating ownership of packages, procedures and tables
gbak:adding missing privileges
gbak:adjusting system generators
gbak: WARNING:function ABS is not defined
gbak: WARNING:    module name or entrypoint could not be found
gbak: WARNING:    function BIN_AND is not defined
gbak: WARNING:    module name or entrypoint could not be found
gbak:finishing, closing, and going home
gbak:adjusting the ONLINE and FORCED WRITES flags

gsbelarus avatar Jan 03 '24 11:01 gsbelarus

now, I will start restoring using gbak utility with -se switch.

gsbelarus avatar Jan 03 '24 11:01 gsbelarus