redis-rdb-tools icon indicating copy to clipboard operation
redis-rdb-tools copied to clipboard

Parser performance not optimal ~1min for a 24MB file

Open sripathikrishnan opened this issue 12 years ago • 15 comments

Profiler output for a 24MB dump.rdb file

     44009161 function calls (44008966 primitive calls) in 205.628 CPU seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 205.628 205.628 :1() 1 0.000 0.000 0.000 0.000 :1(DecimalTuple) 10 0.000 0.000 0.000 0.000 StringIO.py:119(read) 1 0.000 0.000 0.000 0.000 StringIO.py:30() 10 0.000 0.000 0.000 0.000 StringIO.py:38(_complain_ifclosed) 1 0.000 0.000 0.000 0.000 StringIO.py:42(StringIO) 2 0.000 0.000 0.000 0.000 StringIO.py:54(init) 2 0.000 0.000 0.000 0.000 UserDict.py:17(getitem) 1 0.000 0.000 0.000 0.000 UserDict.py:57(get) 1 0.000 0.000 0.000 0.000 UserDict.py:69(contains) 1 0.000 0.000 0.000 0.000 future.py:48() 1 0.000 0.000 0.000 0.000 future.py:74(_Feature) 7 0.000 0.000 0.000 0.000 future.py:75(init) 1 0.046 0.046 0.297 0.297 init.py:1() 1 0.000 0.000 0.000 0.000 init.py:49(normalize_encoding) 1 0.000 0.000 0.013 0.013 init.py:71(search_function) 9/5 0.000 0.000 0.000 0.000 abc.py:137(subclasscheck) 37 0.000 0.000 0.000 0.000 abc.py:7(abstractmethod) 19 0.001 0.000 0.003 0.000 abc.py:78(new) 60 0.000 0.000 0.001 0.000 abc.py:81() 5 0.000 0.000 0.000 0.000 abc.py:97(register) 1 0.060 0.060 0.173 0.173 callbacks.py:1() 1 0.000 0.000 0.000 0.000 callbacks.py:194(DiffCallback) 1 0.020 0.020 0.033 0.033 callbacks.py:26(_floatconstants) 1 0.000 0.000 0.000 0.000 callbacks.py:264(MemoryCallback) 1 0.000 0.000 0.000 0.000 callbacks.py:269(init) 1 0.000 0.000 0.000 0.000 callbacks.py:279(start_rdb) 1 0.000 0.000 0.000 0.000 callbacks.py:282(start_database) 1 0.000 0.000 0.000 0.000 callbacks.py:285(end_database) 1 0.000 0.000 0.000 0.000 callbacks.py:288(end_rdb) 1000001 24.914 0.000 110.527 0.000 callbacks.py:291(set) 2 0.000 0.000 0.000 0.000 callbacks.py:299(start_hash) 2 0.000 0.000 0.000 0.000 callbacks.py:323(start_set) 6 0.000 0.000 0.000 0.000 callbacks.py:327(sadd) 2 0.000 0.000 0.000 0.000 callbacks.py:332(end_set) 1000003 3.987 0.000 9.397 0.000 callbacks.py:383(end_key) 1000003 3.457 0.000 5.409 0.000 callbacks.py:388(newline) 2000004 21.586 0.000 24.588 0.000 callbacks.py:391(sizeof_string) 1000003 5.104 0.000 11.793 0.000 callbacks.py:409(top_level_object_overhead) 1000003 1.560 0.000 1.560 0.000 callbacks.py:416(key_expiry_overhead) 1000003 3.552 0.000 5.197 0.000 callbacks.py:433(hashtable_entry_overhead) 1000003 9.298 0.000 19.548 0.000 callbacks.py:45(_encode_basestring_ascii) 2000006 3.137 0.000 3.137 0.000 callbacks.py:454(sizeof_pointer) 1000003 7.538 0.000 30.889 0.000 callbacks.py:72(_encode) 1000003 3.676 0.000 34.565 0.000 callbacks.py:91(encode_key) 1 0.000 0.000 0.000 0.000 callbacks.py:97(JSONCallback) 1 0.000 0.000 0.000 0.000 codecs.py:77(new) 1 0.004 0.004 0.004 0.004 collections.py:1() 1 0.001 0.001 0.001 0.001 collections.py:13(namedtuple) 34 0.000 0.000 0.000 0.000 collections.py:43() 4 0.000 0.000 0.000 0.000 collections.py:60() 4 0.000 0.000 0.000 0.000 collections.py:61() 6 0.000 0.000 0.000 0.000 copy.py:100(_copy_immutable) 6 0.000 0.000 0.000 0.000 copy.py:65(copy) 1 0.002 0.002 0.068 0.068 decimal.py:116() 1 0.000 0.000 0.000 0.000 decimal.py:158(DecimalException) 1 0.000 0.000 0.000 0.000 decimal.py:181(Clamped) 1 0.000 0.000 0.000 0.000 decimal.py:193(InvalidOperation) 1 0.000 0.000 0.000 0.000 decimal.py:222(ConversionSyntax) 1 0.000 0.000 0.000 0.000 decimal.py:232(DivisionByZero) 1 0.000 0.000 0.000 0.000 decimal.py:248(DivisionImpossible) 1 0.000 0.000 0.000 0.000 decimal.py:259(DivisionUndefined) 1 0.000 0.000 0.000 0.000 decimal.py:270(Inexact) 1 0.000 0.000 0.000 0.000 decimal.py:282(InvalidContext) 1 0.000 0.000 0.000 0.000 decimal.py:296(Rounded) 1 0.000 0.000 0.000 0.000 decimal.py:308(Subnormal) 1 0.000 0.000 0.000 0.000 decimal.py:319(Overflow) 1 0.000 0.000 0.000 0.000 decimal.py:357(Underflow) 1 0.000 0.000 0.000 0.000 decimal.py:3611(_ContextManager) 1 0.000 0.000 0.000 0.000 decimal.py:3626(Context) 3 0.000 0.000 0.000 0.000 decimal.py:3645(init) 1 0.000 0.000 0.000 0.000 decimal.py:4925(_WorkRep) 1 0.000 0.000 0.000 0.000 decimal.py:503(Decimal) 6 0.000 0.000 0.021 0.003 decimal.py:512(new) 1 0.000 0.000 0.000 0.000 decimal.py:5158(_Log10Memoize) 1 0.000 0.000 0.000 0.000 decimal.py:5162(init) 8 0.000 0.000 0.000 0.000 genericpath.py:15(exists) 3 0.000 0.000 0.000 0.000 gettext.py:130(_expand_lang) 1 0.000 0.000 0.001 0.001 gettext.py:421(find) 1 0.000 0.000 0.001 0.001 gettext.py:476(translation) 1 0.000 0.000 0.001 0.001 gettext.py:542(dgettext) 1 0.000 0.000 0.001 0.001 gettext.py:580(gettext) 1 0.000 0.000 0.000 0.000 hex_codec.py:27(hex_decode) 1 0.000 0.000 0.000 0.000 hex_codec.py:45(Codec) 1 0.000 0.000 0.000 0.000 hex_codec.py:52(IncrementalEncoder) 1 0.000 0.000 0.000 0.000 hex_codec.py:57(IncrementalDecoder) 1 0.000 0.000 0.000 0.000 hex_codec.py:62(StreamWriter) 1 0.000 0.000 0.000 0.000 hex_codec.py:65(StreamReader) 1 0.000 0.000 0.000 0.000 hex_codec.py:70(getregentry) 1 0.000 0.000 0.000 0.000 hex_codec.py:8() 1 0.000 0.000 0.000 0.000 io.py:1030(BufferedWriter) 1 0.000 0.000 0.000 0.000 io.py:1117(BufferedRWPair) 1 0.000 0.000 0.000 0.000 io.py:1183(BufferedRandom) 1 0.000 0.000 0.000 0.000 io.py:1247(TextIOBase) 1 0.000 0.000 0.000 0.000 io.py:1295(IncrementalNewlineDecoder) 1 0.000 0.000 0.000 0.000 io.py:1371(TextIOWrapper) 1 0.000 0.000 0.000 0.000 io.py:1850(StringIO) 1 0.000 0.000 0.000 0.000 io.py:267(_DocDescriptor) 1 0.000 0.000 0.000 0.000 io.py:276(OpenWrapper) 1 0.000 0.000 0.000 0.000 io.py:290(UnsupportedOperation) 1 0.000 0.000 0.000 0.000 io.py:294(IOBase) 1 0.028 0.028 0.036 0.036 io.py:35() 1 0.000 0.000 0.000 0.000 io.py:566(RawIOBase) 1 0.000 0.000 0.000 0.000 io.py:621(FileIO) 1 0.000 0.000 0.000 0.000 io.py:643(BufferedIOBase) 1 0.000 0.000 0.000 0.000 io.py:715(_BufferedIOMixin) 1 0.000 0.000 0.000 0.000 io.py:72(BlockingIOError) 1 0.000 0.000 0.000 0.000 io.py:792(_BytesIO) 1 0.000 0.000 0.000 0.000 io.py:898(BytesIO) 1 0.000 0.000 0.000 0.000 io.py:905(BufferedReader) 1 0.000 0.000 0.000 0.000 keyword.py:11() 3 0.000 0.000 0.000 0.000 locale.py:316(normalize) 1 0.000 0.000 0.000 0.000 numbers.py:13(Number) 1 0.000 0.000 0.000 0.000 numbers.py:169(Real) 1 0.000 0.000 0.000 0.000 numbers.py:270(Rational) 1 0.000 0.000 0.000 0.000 numbers.py:295(Integral) 1 0.000 0.000 0.000 0.000 numbers.py:34(Complex) 1 0.000 0.000 0.002 0.002 numbers.py:6() 6 0.000 0.000 0.001 0.000 optparse.py:1007(add_option) 1 0.000 0.000 0.001 0.001 optparse.py:1185(init) 1 0.000 0.000 0.000 0.000 optparse.py:1237(_create_option_list) 1 0.000 0.000 0.001 0.001 optparse.py:1242(_add_help_option) 1 0.000 0.000 0.001 0.001 optparse.py:1252(_populate_option_list) 1 0.000 0.000 0.000 0.000 optparse.py:1262(_init_parsing_state) 1 0.000 0.000 0.000 0.000 optparse.py:1271(set_usage) 1 0.000 0.000 0.000 0.000 optparse.py:1307(_get_all_options) 1 0.000 0.000 0.000 0.000 optparse.py:1313(get_default_values) 1 0.000 0.000 0.000 0.000 optparse.py:1356(_get_args) 1 0.000 0.000 0.000 0.000 optparse.py:1362(parse_args) 1 0.000 0.000 0.000 0.000 optparse.py:1401(check_values) 1 0.000 0.000 0.000 0.000 optparse.py:1414(_process_args) 2 0.000 0.000 0.000 0.000 optparse.py:1511(_process_short_opts) 1 0.000 0.000 0.000 0.000 optparse.py:200(init) 1 0.000 0.000 0.000 0.000 optparse.py:224(set_parser) 1 0.000 0.000 0.000 0.000 optparse.py:365(init) 6 0.000 0.000 0.001 0.000 optparse.py:560(init) 6 0.000 0.000 0.000 0.000 optparse.py:579(_check_opt_strings) 6 0.000 0.000 0.000 0.000 optparse.py:588(_set_opt_strings) 6 0.000 0.000 0.000 0.000 optparse.py:609(_set_attrs) 6 0.000 0.000 0.000 0.000 optparse.py:629(_check_action) 6 0.000 0.000 0.000 0.000 optparse.py:635(_check_type) 6 0.000 0.000 0.000 0.000 optparse.py:665(_check_choice) 6 0.000 0.000 0.000 0.000 optparse.py:678(_check_dest) 6 0.000 0.000 0.000 0.000 optparse.py:693(_check_const) 6 0.000 0.000 0.000 0.000 optparse.py:699(_check_nargs) 6 0.000 0.000 0.000 0.000 optparse.py:708(_check_callback) 2 0.000 0.000 0.000 0.000 optparse.py:752(takes_value) 2 0.000 0.000 0.000 0.000 optparse.py:764(check_value) 2 0.000 0.000 0.000 0.000 optparse.py:771(convert_value) 2 0.000 0.000 0.000 0.000 optparse.py:778(process) 2 0.000 0.000 0.000 0.000 optparse.py:790(take_action) 6 0.000 0.000 0.000 0.000 optparse.py:832(isbasestring) 1 0.000 0.000 0.000 0.000 optparse.py:837(init) 1 0.000 0.000 0.000 0.000 optparse.py:932(init) 1 0.000 0.000 0.000 0.000 optparse.py:943(_create_option_mappings) 1 0.000 0.000 0.000 0.000 optparse.py:959(set_conflict_handler) 1 0.000 0.000 0.000 0.000 optparse.py:964(set_description) 6 0.000 0.000 0.000 0.000 optparse.py:980(_check_conflict) 1 0.042 0.042 0.078 0.078 parser.py:1() 1 0.000 0.000 0.000 0.000 parser.py:239(RdbParser) 1 0.000 0.000 0.000 0.000 parser.py:258(init) 1 11.802 11.802 205.146 205.146 parser.py:267(parse) 2000007 12.386 0.000 33.266 0.000 parser.py:312(read_length_with_encoding) 1 0.000 0.000 0.000 0.000 parser.py:330(read_length) 2000006 11.267 0.000 51.153 0.000 parser.py:333(read_string) 1000003 7.220 0.000 141.652 0.000 parser.py:356(read_object) 1 0.000 0.000 0.000 0.000 parser.py:42(RdbCallback) 2 0.000 0.000 0.000 0.000 parser.py:466(read_intset) 1 0.000 0.000 0.000 0.000 parser.py:602(verify_magic_string) 1 0.000 0.000 0.000 0.000 parser.py:606(verify_version) 1 0.000 0.000 0.000 0.000 parser.py:611(init_filter) 2000006 9.695 0.000 14.738 0.000 parser.py:639(matches_filter) 1000003 1.779 0.000 1.779 0.000 parser.py:649(get_logical_type) 3000012 15.629 0.000 27.115 0.000 parser.py:710(read_unsigned_char) 6 0.000 0.000 0.000 0.000 parser.py:716(read_unsigned_short) 4 0.000 0.000 0.000 0.000 parser.py:722(read_unsigned_int) 1 0.000 0.000 0.000 0.000 parser.py:739(DebugCallback) 8 0.000 0.000 0.000 0.000 posixpath.py:59(join) 1 0.027 0.027 205.613 205.613 rdb:2() 1 0.001 0.001 205.288 205.288 rdb:8(main) 10 0.000 0.000 0.055 0.005 re.py:188(compile) 10 0.000 0.000 0.054 0.005 re.py:229(_compile) 19 0.000 0.000 0.032 0.002 sre_compile.py:184(_compile_charset) 19 0.001 0.000 0.031 0.002 sre_compile.py:213(_optimize_charset) 75 0.000 0.000 0.000 0.000 sre_compile.py:24(_identityfunction) 8 0.001 0.000 0.001 0.000 sre_compile.py:264(_mk_bitmap) 2 0.006 0.003 0.009 0.004 sre_compile.py:307(_optimize_unicode) 22 0.000 0.000 0.000 0.000 sre_compile.py:360(_simple) 10 0.000 0.000 0.007 0.001 sre_compile.py:367(_compile_info) 64/10 0.002 0.000 0.029 0.003 sre_compile.py:38(_compile) 20 0.000 0.000 0.000 0.000 sre_compile.py:480(isstring) 10 0.000 0.000 0.036 0.004 sre_compile.py:486(_code) 10 0.000 0.000 0.054 0.005 sre_compile.py:501(compile) 8 0.000 0.000 0.021 0.003 sre_compile.py:57(fixup) 104 0.000 0.000 0.003 0.000 sre_parse.py:132(len) 226 0.001 0.000 0.001 0.000 sre_parse.py:136(getitem) 22 0.000 0.000 0.000 0.000 sre_parse.py:140(setitem) 96 0.000 0.000 0.000 0.000 sre_parse.py:144(append) 81/32 0.001 0.000 0.001 0.000 sre_parse.py:146(getwidth) 10 0.000 0.000 0.000 0.000 sre_parse.py:184(init) 945 0.004 0.000 0.006 0.000 sre_parse.py:188(next) 209 0.001 0.000 0.001 0.000 sre_parse.py:201(match) 857 0.002 0.000 0.008 0.000 sre_parse.py:207(get) 69 0.000 0.000 0.000 0.000 sre_parse.py:216(isident) 13 0.000 0.000 0.000 0.000 sre_parse.py:222(isname) 12 0.000 0.000 0.000 0.000 sre_parse.py:231(_class_escape) 14 0.000 0.000 0.000 0.000 sre_parse.py:263(_escape) 33/10 0.000 0.000 0.018 0.002 sre_parse.py:307(_parse_sub) 38/10 0.003 0.000 0.017 0.002 sre_parse.py:385(_parse) 10 0.000 0.000 0.018 0.002 sre_parse.py:669(parse) 10 0.000 0.000 0.000 0.000 sre_parse.py:73(__init) 18 0.000 0.000 0.000 0.000 sre_parse.py:78(opengroup) 18 0.000 0.000 0.000 0.000 sre_parse.py:89(closegroup) 64 0.000 0.000 0.000 0.000 sre_parse.py:96(init) 1 0.001 0.001 0.006 0.006 threading.py:1() 2 0.000 0.000 0.000 0.000 threading.py:176(Condition) 1 0.000 0.000 0.000 0.000 threading.py:179(_Condition) 2 0.000 0.000 0.000 0.000 threading.py:181(init) 1 0.000 0.000 0.000 0.000 threading.py:221(_is_owned) 1 0.000 0.000 0.000 0.000 threading.py:272(notify) 1 0.000 0.000 0.000 0.000 threading.py:290(notifyAll) 1 0.000 0.000 0.000 0.000 threading.py:299(_Semaphore) 1 0.000 0.000 0.000 0.000 threading.py:347(_BoundedSemaphore) 1 0.000 0.000 0.000 0.000 threading.py:359(Event) 1 0.000 0.000 0.000 0.000 threading.py:362(_Event) 1 0.000 0.000 0.000 0.000 threading.py:366(init) 1 0.000 0.000 0.000 0.000 threading.py:376(set) 1 0.000 0.000 0.000 0.000 threading.py:414(Thread) 1 0.000 0.000 0.000 0.000 threading.py:426(init) 1 0.000 0.000 0.000 0.000 threading.py:510(_set_ident) 1 0.000 0.000 0.000 0.000 threading.py:57(_Verbose) 4 0.000 0.000 0.000 0.000 threading.py:59(init) 1 0.000 0.000 0.000 0.000 threading.py:64(_note) 1 0.000 0.000 0.000 0.000 threading.py:713(_Timer) 1 0.000 0.000 0.000 0.000 threading.py:742(_MainThread) 1 0.000 0.000 0.000 0.000 threading.py:744(init) 1 0.000 0.000 0.000 0.000 threading.py:752(_set_daemon) 1 0.000 0.000 0.000 0.000 threading.py:783(_DummyThread) 1 0.000 0.000 0.000 0.000 threading.py:99(_RLock) 1 0.000 0.000 0.000 0.000 traceback.py:1() 1 0.000 0.000 0.001 0.001 warnings.py:45(filterwarnings) 1 0.012 0.012 0.013 0.013 {import} 10 0.000 0.000 0.000 0.000 {_sre.compile} 35 0.021 0.001 0.021 0.001 {_sre.getlower} 3000023 4.948 0.000 4.948 0.000 {_struct.unpack} 3 0.000 0.000 0.000 0.000 {abs} 4 0.000 0.000 0.000 0.000 {all} 1 0.000 0.000 0.000 0.000 {binascii.a2b_hex} 26 0.001 0.000 0.001 0.000 {built-in method new of type object at 0x82e5e0} 3 0.000 0.000 0.000 0.000 {built-in method acquire} 10 0.000 0.000 0.000 0.000 {built-in method group} 1000006 3.285 0.000 3.285 0.000 {built-in method match} 2 0.000 0.000 0.000 0.000 {built-in method release} 1000003 2.831 0.000 2.831 0.000 {built-in method search} 1000003 5.982 0.000 5.982 0.000 {built-in method sub} 32 0.000 0.000 0.000 0.000 {chr} 1 0.015 0.015 205.628 205.628 {execfile} 6 0.000 0.000 0.000 0.000 {filter} 400 0.001 0.000 0.001 0.000 {getattr} 8 0.000 0.000 0.000 0.000 {globals} 4 0.000 0.000 0.000 0.000 {hasattr} 3000302 5.241 0.000 5.241 0.000 {isinstance} 20/11 0.000 0.000 0.000 0.000 {issubclass} 2002286/2002258 3.008 0.000 3.008 0.000 {len} 4 0.000 0.000 0.000 0.000 {locals} 2 0.000 0.000 0.000 0.000 {map} 7 0.000 0.000 0.000 0.000 {max} 4 0.000 0.000 0.000 0.000 {method 'contains' of 'frozenset' objects} 2 0.000 0.000 0.000 0.000 {method 'enter' of 'file' objects} 9 0.000 0.000 0.000 0.000 {method 'subclasses' of 'type' objects} 9 0.000 0.000 0.000 0.000 {method 'subclasshook' of 'object' objects} 73 0.000 0.000 0.000 0.000 {method 'add' of 'set' objects} 2000814 3.395 0.000 3.395 0.000 {method 'append' of 'list' objects} 1 0.000 0.000 0.000 0.000 {method 'copy' of 'dict' objects} 1 0.000 0.000 0.013 0.013 {method 'decode' of 'str' objects} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 24 0.000 0.000 0.000 0.000 {method 'endswith' of 'str' objects} 8 0.000 0.000 0.000 0.000 {method 'extend' of 'list' objects} 9 0.000 0.000 0.000 0.000 {method 'find' of 'str' objects} 85 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects} 1 0.000 0.000 0.000 0.000 {method 'insert' of 'list' objects} 30 0.000 0.000 0.000 0.000 {method 'isalnum' of 'str' objects} 4 0.000 0.000 0.000 0.000 {method 'isdigit' of 'str' objects} 33 0.000 0.000 0.000 0.000 {method 'items' of 'dict' objects} 3 0.000 0.000 0.000 0.000 {method 'join' of 'str' objects} 1 0.000 0.000 0.000 0.000 {method 'keys' of 'dictproxy' objects} 4 0.000 0.000 0.000 0.000 {method 'lower' of 'str' objects} 1 0.000 0.000 0.000 0.000 {method 'lstrip' of 'str' objects} 4 0.000 0.000 0.000 0.000 {method 'pop' of 'list' objects} 5000020 13.233 0.000 13.233 0.000 {method 'read' of 'file' objects} 18 0.000 0.000 0.000 0.000 {method 'remove' of 'list' objects} 8 0.000 0.000 0.000 0.000 {method 'replace' of 'str' objects} 3 0.000 0.000 0.000 0.000 {method 'reverse' of 'list' objects} 544 0.002 0.000 0.002 0.000 {method 'setdefault' of 'dict' objects} 2 0.000 0.000 0.000 0.000 {method 'setter' of 'property' objects} 6 0.000 0.000 0.000 0.000 {method 'split' of 'str' objects} 156 0.000 0.000 0.000 0.000 {method 'startswith' of 'str' objects} 3 0.000 0.000 0.000 0.000 {method 'strip' of 'str' objects} 2 0.000 0.000 0.000 0.000 {method 'tolist' of 'array.array' objects} 2 0.000 0.000 0.000 0.000 {method 'tostring' of 'array.array' objects} 1 0.000 0.000 0.000 0.000 {method 'translate' of 'str' objects} 8 0.000 0.000 0.000 0.000 {method 'upper' of 'str' objects} 2000006 5.662 0.000 5.662 0.000 {method 'write' of 'file' objects} 135 0.000 0.000 0.000 0.000 {min} 2 0.139 0.069 0.139 0.069 {open} 68 0.000 0.000 0.000 0.000 {ord} 8 0.000 0.000 0.000 0.000 {posix.stat} 9 0.000 0.000 0.000 0.000 {range} 1 0.000 0.000 0.000 0.000 {repr} 109 0.000 0.000 0.000 0.000 {setattr} 1 0.000 0.000 0.000 0.000 {sys._getframe} 3 0.000 0.000 0.000 0.000 {thread.allocate_lock} 2 0.000 0.000 0.000 0.000 {thread.get_ident}

sripathikrishnan avatar Mar 26 '12 01:03 sripathikrishnan

Did some profiling and a quick patch with: https://github.com/teepark/python-lzf resulting in x2 performance boost.

yoav-steinberg avatar Apr 17 '12 14:04 yoav-steinberg

@yoav-steinberg : Thanks for taking time to investigate this issue!

I don't like adding a dependency to the project. Let me investigate if there is a way to conditionally include the library, so that people who don't want to install the dependency can still use rdb-tools.

sripathikrishnan avatar Apr 22 '12 07:04 sripathikrishnan

You can also consider including the c files from the liblzf directly in redis-rdb-tools instead of adding a dependency. It is fairly common to have liblzf files included inside a larger project (actually redis does this!).

yoav-steinberg avatar Apr 22 '12 07:04 yoav-steinberg

Simplistic patch here: https://github.com/jvtm/redis-rdb-tools/tree/lzf-speedup

Not creating a pull request just yet, I want to test this with real fresh dumps first. The related unit tests pass, but I didn't check if error reporting on invalid values behaves the same.

jvtm avatar May 15 '15 18:05 jvtm

@sripathikrishnan any thoughts on including @jvtm's patch? It doesn't require python-lzf, but uses it if its there.

joshowen avatar Dec 30 '15 23:12 joshowen

bump. Any chance on getting this? Parsing a 10g backup for me is brutal.

billcrook avatar Dec 20 '19 15:12 billcrook

Wow, didn't even remember this one... Not working anymore on the project where this was required. Here's the exact tiny commit: https://github.com/jvtm/redis-rdb-tools/commit/fdd8134bed488462d0bfae449b542bb3d611f7d3 (failed to include this issue in commit message)

jvtm avatar Dec 20 '19 15:12 jvtm

I dug around the code and noticed this commit introduced the lzf optimization.

billcrook avatar Dec 21 '19 15:12 billcrook

@billcrook the current code only uses the lzf optimization if you have the native library installed. maybe you just need to do pip install lzf?

does the commit @jvtm mentioned changes anything? seem to me that it does the same thing the current version already does. please let me know if i'm missing anything.

oranagra avatar Dec 22 '19 06:12 oranagra

@billcrook the current code only uses the lzf optimization if you have the native library installed. maybe you just need to do pip install lzf?

You mean python-lzf, right?

does the commit @jvtm mentioned changes anything? seem to me that it does the same thing the current version already does. please let me know if i'm missing anything.

You are correct. It seems to do the same check for existence of lzf module.

billcrook avatar Dec 23 '19 14:12 billcrook

@billcrook no, not python-lzf that's the python re-implementation. the fast one, which we rather use is just lzf which are python bindings to the C implementation.

oranagra avatar Dec 23 '19 15:12 oranagra

Are you sure about that? When I remove python-lzf and install lzf I get:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 13, in parse
  File "/Users/billcrook/dev/audiomack/data-pipeline/venv/lib/python3.6/site-packages/rdbtools/parser.py", line 461, in parse_fd
    self.read_object(f, data_type)
  File "/Users/billcrook/dev/audiomack/data-pipeline/venv/lib/python3.6/site-packages/rdbtools/parser.py", line 569, in read_object
    value = self.read_string(f)
  File "/Users/billcrook/dev/audiomack/data-pipeline/venv/lib/python3.6/site-packages/rdbtools/parser.py", line 508, in read_string
    val = self.lzf_decompress(f.read(clen), l)
  File "/Users/billcrook/dev/audiomack/data-pipeline/venv/lib/python3.6/site-packages/rdbtools/parser.py", line 1021, in lzf_decompress
    return lzf.decompress(compressed, expected_length)
AttributeError: module 'lzf' has no attribute 'decompress

billcrook avatar Dec 23 '19 16:12 billcrook

For reference: https://github.com/sripathikrishnan/redis-rdb-tools/pull/110#issue-160349318

billcrook avatar Dec 23 '19 19:12 billcrook

@billcrook sorry, it seems that i was wrong.. python-lzf is the one that's native, and redis-rdb-tools has no use of the lzf library. maybe @galcohen-redislabs can provide some insight or spot a regression.

oranagra avatar Dec 24 '19 09:12 oranagra

@billcrook Please provide some rough numbers on the rdb file: Number of keys, average value size, time it takes to rdb --command json it.

galcohen-redislabs avatar Dec 24 '19 09:12 galcohen-redislabs