goleveldb icon indicating copy to clipboard operation
goleveldb copied to clipboard

Can not start filer in dir /storage/filer : leveldb/storage: corrupted or incomplete meta file.

Open ijunaid8989 opened this issue 7 years ago • 13 comments

Just as a brief history of the issue,

  • https://github.com/chrislusf/seaweedfs/issues/576
  • https://github.com/chrislusf/seaweedfs/issues/578

we are using SeaweedFS, for image storage, and this tool is using your repo for handling all leveldb queries and methodologies

  • https://github.com/chrislusf/seaweedfs/blob/master/weed/filer/embedded_filer/files_in_leveldb.go#L8

Issues:

Our Disk File system was XFS, and it got corrupt, we used xfs_repair, to repair it and get it back to life, But while all these operations, our ldb files got corrupt, and we couldn't start the leveldb, from db directory with an error as

I1101 12:42:19 31159 volume.go:110] loading index file /storage/1082.idx readonly false
F1101 12:42:19 31159 filer_server.go:53] Can not start filer in dir /storage/filer : leveldb/storage: corrupted or incomplete meta file
goroutine 21 [running]:

I used that tool https://github.com/rchunping/leveldb-tools to repair leveldb files, It solved the above error but when I start leveldb at port 8888, it gave me such error as,

017/11/02 01:50:41 http: panic serving 39.36.53.157:46124: leveldb: internal key "\x00\x01d,", len=4: invalid length
goroutine 2498 [running]:
net/http.(*conn).serve.func1(0xcd3aaf8080)
	/usr/lib/go/src/net/http/server.go:1389 +0xc1
panic(0xaf99a0, 0xcfefca8210)

I found another solution for this as in a python script,

#!/usr/local/bin/python

import leveldb
leveldb.RepairDB('/data/leveldb-db1')

It solved the above issue as well NOTE: All operations was done on the basic corrupt files, not repaired and then after repaired files.

But Now, we can read the data in ldb files, but It don't let us save something new in it,

and gave such error as

I1102 12:41:51  9063 needle.go:80] Reading Content [ERROR] multipart: Part Read: unexpected EOF
I1102 12:41:51  9063 filer_server_handlers_write.go:106] failing to connect to volume server /everc-fhlcr/snapshots/recordings/2017/11/02/11/40_21_000.jpg Post http://master:8080/278,01661b8efc299f5656: read tcp master:8888->110.36.213.6:51741: i/o timeout
I1102 12:41:51  9063 filer_server_handlers_write.go:106] failing to connect to volume server /everc-fhlcr/snapshots/recordings/2017/11/02/11/40_11_000.jpg Post http://master:8080/250,01661b8ee8c040eccf: unexpected EOF

Question: Is there any repair method in your go implementation of Leveldb? Which we should use to repair the leveldb? or anything you can refer to, what is the issue here?

ijunaid8989 avatar Nov 19 '17 04:11 ijunaid8989

Try leveldb.RecoverFile:

package main

import (
	"log"

	"github.com/syndtr/goleveldb/leveldb"
)

func main() {
	db, err := leveldb.RecoverFile("/data/leveldb-db1", nil)
	if err != nil {
		log.Fatal(err)
	}
	db.Close()
}

syndtr avatar Nov 19 '17 10:11 syndtr

Thanks for you quick answer,

I have tried it and I got this error after few minutes.

root@Ubuntu-1404-trusty-64-minimal ~ # ./repair
2017/11/19 13:35:50 leveldb/table: corruption on table-footer (pos=2118053): bad magic number [file=3380087.ldb]

Can you assist what to do next now?

ijunaid8989 avatar Nov 19 '17 13:11 ijunaid8989

I deleted that ldb file and again went on with repair command, And I got this error as

2017/11/19 15:17:48 leveldb/table: corruption on meta-block (pos=2098585): checksum mismatch, want=0xb5d00c82 got=0xcacf6ef [file=3380089.ldb]

ijunaid8989 avatar Nov 19 '17 14:11 ijunaid8989

(Not reponsible for the repo, just answering as an onlooker.)

Clearly your database files are corrupt, likely beyond repair. This is the point at which you restore from backup or rebuild based on original data from somewhere else. I don't think further magic repair options are going to help - and even if they did make the system happy, how would you ever know the data is consistent?

calmh avatar Nov 19 '17 14:11 calmh

Actually, I have repaired them using this,

#!/usr/local/bin/python

import leveldb
leveldb.RepairDB('/data/leveldb-db1')

The only issue is: I cannot re-write into them or anyhow, I am unable to understand the issue here, I am asking about possible solutions:

1: Is that any possibility that I can duplicate all the ldb files into a new folder, with no corruption? 2: Can I fix those errors with the goleveldb repo? 3: Should I use the actuall, Leveldb Google repo's repair to do so?

ijunaid8989 avatar Nov 19 '17 15:11 ijunaid8989

@ijunaid8989 I fix few things, you may try again, sync your repo first, i.e. go get -u github.com/syndtr/goleveldb/leveldb.

TBH, @calmh probably correct. This probably restore leveldb into working state, but it will not recover all your data, it only recover what can be recovered, missing keys is to be expected and we don't know how filer will cope with that.

syndtr avatar Nov 20 '17 00:11 syndtr

@syndtr , Thanks for the work you have done.

Can you please do a little bit more change, as Recover table first, stop working when it finds corruption. You changed it, so don't report just continue, But can you make it in that way, It reports as well in the log and continues? In this way, it will be better to see which were the files (corrupt),

so we can identify at the end that which files were corrupt, with such logs?

2017/11/19 15:17:48 leveldb/table: corruption on meta-block (pos=2098585): checksum mismatch, want=0xb5d00c82 got=0xcacf6ef [file=3380089.ldb]

ijunaid8989 avatar Nov 22 '17 08:11 ijunaid8989

It is already reported in the LOG file. Search line starting with table@recovery.

syndtr avatar Nov 22 '17 09:11 syndtr

Okay thanks, I can see that now.

One question: We have recovered it, and its working fine, but after recovery , the LDB files are very slow in reading, as before corruption, they were all very fast in showing the data, (We mostly have the directory structure in that created by seaweedfs Filer).

Is there any option to optimize the speed of reading?

ijunaid8989 avatar Nov 24 '17 09:11 ijunaid8989

Also, right now we are just repairing the files using no options like paranoid checks and compaction.

Is there any possibility to add those option to this

package main

import (
	"log"

	"github.com/syndtr/goleveldb/leveldb"
)

func main() {
	db, err := leveldb.RecoverFile("/storage/filer", nil)
	if err != nil {
		log.Fatal(err)
	}
	db.Close()
}

ijunaid8989 avatar Nov 24 '17 09:11 ijunaid8989

One question: We have recovered it, and its working fine, but after recovery , the LDB files are very slow in reading, as before corruption, they were all very fast in showing the data, (We mostly have the directory structure in that created by seaweedfs Filer).

The levels is still rebuilding, the performance will be restored once the level is rebuild. CompactRange might speed things up:

package main

import (
	"log"

	"github.com/syndtr/goleveldb/leveldb"
	"github.com/syndtr/goleveldb/leveldb/util"
)

func main() {
	db, err := leveldb.OpenFile("/data/leveldb-db1", nil)
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()
	if err := db.CompactRange(util.Range{}); err != nil {
		log.Fatal(err)
	}
}

Also, right now we are just repairing the files using no options like paranoid checks and compaction.

StrictAll I believe is somewhat similar paranoid checks, however the default settings already checks for journal and blocks integrity which I believe is already sufficient. See https://godoc.org/github.com/syndtr/goleveldb/leveldb/opt#Strict.

syndtr avatar Nov 24 '17 10:11 syndtr

You need to increase open files limit, e.g. ulimit -n 10000.

syndtr avatar Nov 25 '17 04:11 syndtr

Thanks for your help @syndtr.

Compaction is still going on and its been 3 days already, before repair and compaction we had, 65262 files in database directory, which are now, 65948 and the compaction log is still going on with such logs

07:12:25.933800 table@build created L1@3380919 N·95300 S·2MiB "\x00\x00w..jpg,v66946521":"\x00\x00w..jpg,v67042291"
07:18:29.731085 table@build created L1@3380920 N·95300 S·2MiB "\x00\x00w..jpg,v67042340":"\x00\x00w..jpg,v67137253"
07:24:33.121964 table@build created L1@3380921 N·95300 S·2MiB "\x00\x00w..jpg,v67137458":"\x00\x00w..jpg,v67232195"
07:30:37.162468 table@build created L1@3380922 N·95200 S·2MiB "\x00\x00w..jpg,v67232386":"\x00\x00w..jpg,v67327983"
07:36:41.937491 table@build created L1@3380923 N·95300 S·2MiB "\x00\x00w..jpg,v67327622":"\x00\x00x..jpg,v67422824"
07:42:46.359768 table@build created L1@3380924 N·95200 S·2MiB "\x00\x00x..jpg,v67423492":"\x00\x00x..jpg,v67518006"
07:48:51.961748 table@build created L1@3380925 N·95200 S·2MiB "\x00\x00x..jpg,v67518389":"\x00\x00x..jpg,v67613408"
07:54:55.551695 table@build created L1@3380926 N·95300 S·2MiB "\x00\x00x..jpg,v67613451":"\x00\x00x..jpg,v67708891"
08:00:59.880944 table@build created L1@3380927 N·95300 S·2MiB "\x00\x00x..jpg,v67709347":"\x00\x00x..jpg,v67804275"
08:07:03.667411 table@build created L1@3380928 N·95300 S·2MiB "\x00\x00x..jpg,v67803689":"\x00\x00x..jpg,v67899399"

while creating new ldb files, is there any estimate or way to check, when will it be completed?

ijunaid8989 avatar Nov 28 '17 07:11 ijunaid8989