lambdaupb comments

Results 33 comments of


                                            lambdaupb

Bytes 16-20 may be Active Weapon

some files were copied in this repo: https://github.com/oaken-source/pyd2s/tree/master/docs

Check info from Corey H.

@psycho23 I am very interested in the material not being lost. zippyshare or whatever.

[MEMORY] Strings de-duplication in lexicons etc.

Starting the JVM with `-XX:+UseG1GC -XX:+UseStringDeduplication -Xlog:stringdedup*=debug` leads to the following debug output of the G1GC string deduplication: ``` [163.650s][info ][gc,stringdedup] Concurrent String Deduplication (163.650s) [163.650s][info ][gc,stringdedup] Concurrent String Deduplication...

[MEMORY] Strings de-duplication in lexicons etc.

This is overall with just a loaded pipeline. `tokenize,ssplit,pos,lemma,ner,depparse,coref,quote` The issue description I think covers most of the duplicates. It should be possible to catch most of it doing some...

[MEMORY] Strings de-duplication in lexicons etc.

Distsim.lexicon and NERFeatureFactory.lexicon seem to be deserialized and so deduplication needs to be injected with a magic deserialize method. ```java private void readObject(java.io.ObjectInputStream in) throws IOException, ClassNotFoundException { in.defaultReadObject(); StringDedup.INST.dedupInplace(lexicon);...

[MEMORY] Strings de-duplication in lexicons etc.

056c413b2468ce6937dfa6aeb4ae03235e5fa09a comes out at 3243MB, so 82MB improvement. Its quick to measure though, Just download https://visualvm.github.io/ and the pipeline setup + sleep main.

[MEMORY] Strings de-duplication in lexicons etc.

Yeah, just the master branch in this repo at 056c413. I just put the models jar manually into the project in intellij and start my main class in the IDE.

[MEMORY] Strings de-duplication in lexicons etc.

That should work great as well. But be sure to call System.gc() a few times.

[MEMORY] Strings de-duplication in lexicons etc.

Well, it accounts for around 300MB extra total, and the models are loaded sequentially. I think it would still achieve lower peak usage. Deduplicating the strings before serializing is probably...

[MEMORY] Strings de-duplication in lexicons etc.

It is very magical. I don't know how much java serialization is used in this project in general, but replacing it with more boring solutions might be advisable in the...