dagli icon indicating copy to clipboard operation
dagli copied to clipboard

How to: Feed Output of final dense layer to LSTM using the examples=placeholders as time series data?

Open cyberbeat opened this issue 4 years ago • 18 comments

I try to do feature matching on html-document-nodes, and use features of these document nodes (tagname, text, class, length,..) as placeholder (struct). Currently, the network consists of some fasttext and lstm layers, and the final layer is a dense layer, and all nodes are processed independently of each other. The network lacks a time series like connection of the nodes in a document (the classification of a node may depend on the classification of the previous node).

Do I understand right, that something like that would be possible in dl4j? https://deeplearning4j.konduit.ai/deeplearning4j/reference/recurrent-layers#inference-predictions-one-step-at-a-time So, before each document the recurrrent state could be cleared?

How would I do that with dagli?

Btw, I recognized a strong performance (accuracy) drop, when using fasttext with multi-label classification, instead of multiple fasttext instances with single-boolean classification.

cyberbeat avatar Aug 01 '21 09:08 cyberbeat

It's easy enough to pass the previous node's classification back into the model as an input feature; this will generally be less efficient than an RNN-based decoder-like apparatus (as your link points out) but the difference in inference cost may not matter to you unless you're processing a very large number of documents. If you are, DL4J does support "sequence to sequence" models and it's of course possible to create a new Dagli transformer to wrap such a model with some effort, but Dagli does not itself provide a layer-based API for RNN-based decoders.

Incidentally, dependencies among the labels are not essential, and indeed often counterproductive, in problems where you care about getting as many labels correct as possible rather than as many sequences correct as possible. E.g. a correct sequence matters a lot for, say, translation (producing readable, cohesive text), but not for, say, labeling HTML nodes if you only care about how many HTML nodes you label correctly. Consequently, you may be better off predicting node labels one-at-time (while still using features from adjacent nodes as a contextual window) rather than trying to model dependencies amongst the labels.

With respect to FastText, the default behavior is multinomial prediction (predicted probabilities sum to 1) rather than multilabel (each label predicted with probability \in [0, 1]); there is a setting to adjust inference to instead be multilabel, but it is a bit of a hack. If the labels are correlated (as they typically are), I'd expect (with the right choice of hyperparameters) a multinomial/multilabel FastText model to outperform a set of boolean FastText models for a given number of model parameters and examples.

jeffpasternack avatar Aug 01 '21 21:08 jeffpasternack

Thanks for your detailled answer. I now already tried adding some features of previous node to the current one. But it is difficult, which information from how many adjacent data I need to take. Also the nice "struct"-placerholder idea gets a bit polluted by this.

About the fasttext problem, a) first try with the same parameters with only difference: multilabel=true (5 Labels) for the second one

Evaluation for Boolean Single Label

Highest accuracy = 0.9126995046780407
Decision threshold = 0.2756439447402954
True Positives = 1556.0, False Positives = 1229.0
False Negatives = 1309.0, True Negatives = 24978.0
Precision = 0.5587073608617594 @ Recall = 0.5431064572425829, F1 = 0.5507964601769911
Accuracy = 0.9126995046780407 (accuracy for baseline that predicts the mode label = 0.9014515685195377)

Highest F1-score = 0.5824330671676843
Decision threshold = 0.11171691864728928
True Positives = 1860.0, False Positives = 1662.0
False Negatives = 1005.0, True Negatives = 24545.0
Precision = 0.5281090289608177 @ Recall = 0.6492146596858639, F1 = 0.5824330671676843
Accuracy = 0.9082622454595487 (accuracy for baseline that predicts the mode label = 0.9014515685195377)

ROC AUC = 0.8280554713709946, Average Precision = 0.292946242465893

Evaluation for Multilabel (First Label)

Highest accuracy = 0.9014515685195377
Decision threshold = Infinity
True Positives = 0.0, False Positives = 0.0
False Negatives = 2865.0, True Negatives = 26207.0
Precision = 1.0 @ Recall = 0.0, F1 = 0.0
Accuracy = 0.9014515685195377 (accuracy for baseline that predicts the mode label = 0.9014515685195377)

Highest F1-score = NaN
Decision threshold = 0.9999876022338867
True Positives = 0.0, False Positives = 3.0
False Negatives = 2865.0, True Negatives = 26204.0
Precision = 0.0 @ Recall = 0.0, F1 = NaN
Accuracy = 0.901348376444689 (accuracy for baseline that predicts the mode label = 0.9014515685195377)

ROC AUC = 0.8280554713709946, Average Precision = 0.292946242465893

b) I now increase embedding size by factor 5 for the multilabel case:

Highest accuracy = 0.9014515685195377
Decision threshold = Infinity
True Positives = 0.0, False Positives = 0.0
False Negatives = 2865.0, True Negatives = 26207.0
Precision = 1.0 @ Recall = 0.0, F1 = 0.0
Accuracy = 0.9014515685195377 (accuracy for baseline that predicts the mode label = 0.9014515685195377)

Highest F1-score = NaN
Decision threshold = 0.9999999403953552
True Positives = 0.0, False Positives = 1.0
False Negatives = 2865.0, True Negatives = 26206.0
Precision = 0.0 @ Recall = 0.0, F1 = NaN
Accuracy = 0.9014171711612549 (accuracy for baseline that predicts the mode label = 0.9014515685195377)

ROC AUC = 0.823402823713033, Average Precision = 0.290056676641253

Perhaps I made a mistake, so I post my code here for this testcase. Perhaps the order of the labels somehow gets lost on the way? Label is an enum with 5 Labels, and p.asLabels() produces HashSet<Label>:

FastTextClassification<Label> fastTextClassificationText = new FastTextClassification<Label>()
		.withLabelsInput(p.asLabels())
		.withMultilabel(true)
		.withTokensInput(new PreProcessorText().withInput(p.asText()))
		.withEmbeddingLength(40)
		.withLossType(FastTextLoss.SOFTMAX)
		.withMaxWordNgramLength(3)
		.withEpochCount(50);
DAG1x1<Node, DiscreteDistribution<Label>> dag = DAG.withPlaceholder(p).withOutput(fastTextClassificationText);

MyIterable trainData = new MyIterable(offset, limit);
MyIterable testData = new MyIterable(0, offset);

DAG1x1.Prepared<Node, DiscreteDistribution<Label>> res = dag.prepare(trainData);

DAG1x1.Prepared.Result<DiscreteDistribution<Label>> predictedLabels = res.applyAll(testData);

Label[] labels = Label.values();

for (int i = 0; i < labels.length; i++) {
	final int labelIndex = i;
	System.out.println("----------- Evaluation " + labels[i] + " --------------");
	BinaryEvaluationResult eval = BinaryEvaluation.evaluate(Iterables.map(testData, x -> x._labels.contains(labels[labelIndex])), predictedLabels.lazyMap(x -> x.get(labels[labelIndex])));
	System.out.println(eval.getSummary() + "\n");
}

cyberbeat avatar Aug 02 '21 14:08 cyberbeat

One way to approach would be to use collections or lists in the Struct to capture sets or sequences of contextual features; they can then be processed using list-wise transformations.

WRT your FastText code, I don't see any obvious issues, but it's very suspicious that the highest accuracy is achieved by adopting an infinite threshold (and classifying everything as negative). This could be because the learned model is just very, very bad, but I'd suggest poking around a bit: checking values in the debugger, perhaps evaluating on the training data to see if the accuracy is any higher (it should be), etc. I'd also check at least a few of the predicted multilabel distributions to make sure it's not doing something weird like setting all the probabilities the same or setting them to Inf or something else that might indicate a bug either in the data or (less likely) in Dagli's FastText implementation.

jeffpasternack avatar Aug 03 '21 07:08 jeffpasternack

How would you use list-wise transformations, for example for some feature like "font-size"? Would you suggest something like max-pooling?

For the fasttext I now tried with 3 times more training-examples and 1000 epochs. I get better results for multilabel (F1 score = 0.33), but still much worse than boolean-approach. I also experimented with different embedding sizes, SOFTMAX/NEGATIVE SAMPLING (I get better results with lower embedding sizes and NEGATIVE SAMPLING).

Do you have examples, where you get better results for multilabel in comparison with boolean?

cyberbeat avatar Aug 09 '21 19:08 cyberbeat

I meant list-wise transformations to process the features in the DAG (outside the neural network). Within the neural network, one option would be to model the context as a sequence--e.g. use bidirectional LSTM layers encode the sequence of nodes up to the current one in the forward and backward directions. Alternatively, for a fixed-size context window, you don't need a (sequence) encoder, and you can just feed the features of the X nodes before and after your current node as you would any other features. So, for example, one of your features could be "font-size of the node preceding the current node", and another could be "font-size of the node three nodes after the current node", etc.

WRT FastText, it's odd that you're getting better results with negative sampling; lower embedding sizes might make sense if you have limited data. There may be a bug somewhere, but it's not something I can really diagnose on my end. It's also true that FastText is trained as a multinomial model, so multilabel inference is something of a hack; if your boolean models are outperforming multilabel, you could just stick with those--you're not losing anything in principle, assuming you have sufficient training data (the benefit of training a multinomial/multilabel FastText model over binary is primarily that the model can leverage commonalities among the labels, which becomes moot when data is abundant).

Unfortunately, I don't think I've ever directly compared multiple boolean FastText models with a multilabel model; my teams' use cases for FastText have been for multinomial problems.

jeffpasternack avatar Aug 10 '21 05:08 jeffpasternack

WRT FastText I now gained some training speedup by pre-learning and serializing a list of DAGs with the fasttext-layers only. So I can experiment in the later chain more easily.

WRT node context: after hard thinking how to do that with dagli, I came up with something like this:

@Struct
class Node {
  Node prev, next;
  Node[] context(){
    // ..collect static list of 5 previous and 5 next nodes
  }
..
}

And then using it like this:

DenseLayerInput<NNSplitVectorSequenceLayer>.Aggregated aggregated = new NNSplitVectorSequenceLayer().withInput().concatenating();

for (int i = 0; i < 10; i++) {
	ArrayElement<Node> node = new ArrayElement<Node>().withIndex(i).withInput(p.asContext());
	aggregated = aggregated.fromNumbers(new Node.TextLength().withInput(node));
        ..
}

NNSplitVectorSequenceLayer split = aggregated.done().withSplitSize(10);
NNLSTMLayer lstmLayer = new NNLSTMLayer()
	.withInput(split)
	.withBidirectionality(Bidirectionality.FORWARD_ONLY);

NNMaxPoolingLayer poolingLayer = new NNMaxPoolingLayer().withInput(lstmLayer);
..

But if I use NNLastVectorInSequenceLayer an exception is throws, that dl4j does not support multiple inputs for that layers. So I tried NNMaxPoolingLayer, but here another exception is thrown:

java.io.FileNotFoundException: nd4j-op-def.pbtxt cannot be opened because it does not exist
	at org.nd4j.common.io.ClassPathResource.getInputStream(ClassPathResource.java:248)
	at org.nd4j.common.io.ClassPathResource.getInputStream(ClassPathResource.java:235)
	at org.nd4j.ir.OpDescriptorHolder.nd4jOpList(OpDescriptorHolder.java:86)
	at org.nd4j.ir.OpDescriptorHolder.<clinit>(OpDescriptorHolder.java:47)
	at org.nd4j.linalg.api.ops.DynamicCustomOp.propertiesForFunction(DynamicCustomOp.java:1069)
	at org.nd4j.autodiff.samediff.serde.FlatBuffersMapper.asFlatNode(FlatBuffersMapper.java:863)
	at org.nd4j.autodiff.samediff.serde.FlatBuffersMapper.cloneViaSerialize(FlatBuffersMapper.java:988)
	at org.nd4j.autodiff.samediff.SameDiff.invokeGraphOn(SameDiff.java:526)
	at org.nd4j.autodiff.samediff.SameDiff$1.define(SameDiff.java:4200)
	at org.nd4j.autodiff.samediff.SameDiff.defineFunction(SameDiff.java:3993)
	at org.nd4j.autodiff.samediff.SameDiff.defineFunction(SameDiff.java:3978)
	at org.nd4j.autodiff.samediff.SameDiff.createGradFunction(SameDiff.java:4189)
	at org.deeplearning4j.nn.layers.samediff.SameDiffGraphVertex.doBackward(SameDiffGraphVertex.java:166)
	at org.deeplearning4j.nn.graph.ComputationGraph.calcBackpropGradients(ComputationGraph.java:2772)
	at org.deeplearning4j.nn.graph.ComputationGraph.computeGradientAndScore(ComputationGraph.java:1381)
	at org.deeplearning4j.nn.graph.ComputationGraph.computeGradientAndScore(ComputationGraph.java:1341)
	at org.deeplearning4j.optimize.solvers.BaseOptimizer.gradientAndScore(BaseOptimizer.java:174)
	at org.deeplearning4j.optimize.solvers.StochasticGradientDescent.optimize(StochasticGradientDescent.java:61)
	at org.deeplearning4j.optimize.Solver.optimize(Solver.java:52)
	at org.deeplearning4j.nn.graph.ComputationGraph.fitHelper(ComputationGraph.java:1165)
	at org.deeplearning4j.nn.graph.ComputationGraph.fit(ComputationGraph.java:1115)
	at org.deeplearning4j.nn.graph.ComputationGraph.fit(ComputationGraph.java:1082)
	at com.linkedin.dagli.dl4j.NeuralNetwork$Preparer.finish(NeuralNetwork.java:254)
	at com.linkedin.dagli.dl4j.NeuralNetwork$Preparer.finish(NeuralNetwork.java:50)
	at com.linkedin.dagli.nn.AbstractNeuralNetwork$Preparer.finishUnsafe(AbstractNeuralNetwork.java:1232)
	at com.linkedin.dagli.dag.MultithreadedDAGExecutor$PreparationFinishTask.onRun(MultithreadedDAGExecutor.java:792)
	at com.linkedin.dagli.dag.MultithreadedDAGExecutor$Task.run(MultithreadedDAGExecutor.java:368)
	at com.linkedin.dagli.dag.MultithreadedDAGExecutor$Scheduler.lambda$schedule$4(MultithreadedDAGExecutor.java:329)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Exception in thread "main" java.lang.RuntimeException: MultithreadedDAGExecutor terminated execution because it encountered an unexpected exception in a worker thread: java.lang.RuntimeException: Error during neural network backpropagation calculation
	at com.linkedin.dagli.dag.MultithreadedDAGExecutor.executeUnsafe(MultithreadedDAGExecutor.java:1546)
	at com.linkedin.dagli.dag.MultithreadedDAGExecutor.prepareUnsafeImpl(MultithreadedDAGExecutor.java:1497)
	at com.linkedin.dagli.dag.LocalDAGExecutor.prepareUnsafeImpl(LocalDAGExecutor.java:71)
	at com.linkedin.dagli.dag.AbstractDAGExecutor.prepareUnsafe(AbstractDAGExecutor.java:99)
	at com.linkedin.dagli.dag.DAG1x1.prepare(DAG1x1.java:253)
..
Caused by: java.lang.RuntimeException: Error during neural network backpropagation calculation
	at org.deeplearning4j.nn.graph.ComputationGraph.calcBackpropGradients(ComputationGraph.java:2860)
	at org.deeplearning4j.nn.graph.ComputationGraph.computeGradientAndScore(ComputationGraph.java:1381)
	at org.deeplearning4j.nn.graph.ComputationGraph.computeGradientAndScore(ComputationGraph.java:1341)
	at org.deeplearning4j.optimize.solvers.BaseOptimizer.gradientAndScore(BaseOptimizer.java:174)
	at org.deeplearning4j.optimize.solvers.StochasticGradientDescent.optimize(StochasticGradientDescent.java:61)
	at org.deeplearning4j.optimize.Solver.optimize(Solver.java:52)
	at org.deeplearning4j.nn.graph.ComputationGraph.fitHelper(ComputationGraph.java:1165)
	at org.deeplearning4j.nn.graph.ComputationGraph.fit(ComputationGraph.java:1115)
	at org.deeplearning4j.nn.graph.ComputationGraph.fit(ComputationGraph.java:1082)
	at com.linkedin.dagli.dl4j.NeuralNetwork$Preparer.finish(NeuralNetwork.java:254)
	at com.linkedin.dagli.dl4j.NeuralNetwork$Preparer.finish(NeuralNetwork.java:50)
	at com.linkedin.dagli.nn.AbstractNeuralNetwork$Preparer.finishUnsafe(AbstractNeuralNetwork.java:1232)
	at com.linkedin.dagli.dag.MultithreadedDAGExecutor$PreparationFinishTask.onRun(MultithreadedDAGExecutor.java:792)
	at com.linkedin.dagli.dag.MultithreadedDAGExecutor$Task.run(MultithreadedDAGExecutor.java:368)
	at com.linkedin.dagli.dag.MultithreadedDAGExecutor$Scheduler.lambda$schedule$4(MultithreadedDAGExecutor.java:329)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.ExceptionInInitializerError
	at org.nd4j.linalg.api.ops.DynamicCustomOp.propertiesForFunction(DynamicCustomOp.java:1069)
	at org.nd4j.autodiff.samediff.serde.FlatBuffersMapper.asFlatNode(FlatBuffersMapper.java:863)
	at org.nd4j.autodiff.samediff.serde.FlatBuffersMapper.cloneViaSerialize(FlatBuffersMapper.java:988)
	at org.nd4j.autodiff.samediff.SameDiff.invokeGraphOn(SameDiff.java:526)
	at org.nd4j.autodiff.samediff.SameDiff$1.define(SameDiff.java:4200)
	at org.nd4j.autodiff.samediff.SameDiff.defineFunction(SameDiff.java:3993)
	at org.nd4j.autodiff.samediff.SameDiff.defineFunction(SameDiff.java:3978)
	at org.nd4j.autodiff.samediff.SameDiff.createGradFunction(SameDiff.java:4189)
	at org.deeplearning4j.nn.layers.samediff.SameDiffGraphVertex.doBackward(SameDiffGraphVertex.java:166)
	at org.deeplearning4j.nn.graph.ComputationGraph.calcBackpropGradients(ComputationGraph.java:2772)
	... 17 more
Caused by: java.lang.NullPointerException
	at org.nd4j.ir.OpDescriptorHolder.<clinit>(OpDescriptorHolder.java:53)
	... 27 more

cyberbeat avatar Aug 24 '21 07:08 cyberbeat

I succeeded in running the training without exception by adding the required file "nd4j-op-def.pbtxt" to the classpath (is this a dl4j bug?).

But when trying to deserialize and do evaluation:

Exception in thread "main" java.lang.RuntimeException: Error deserializing JSON ComputationGraphConfiguration. Saved model JSON is not a valid ComputationGraphConfiguration
	at org.deeplearning4j.util.ModelSerializer.restoreComputationGraphHelper(ModelSerializer.java:560)
	at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:462)
	at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:592)
	at com.linkedin.dagli.dl4j.SerializableComputationGraph.readObject(SerializableComputationGraph.java:44)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at java.base/java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1175)
	at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2325)
	at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2196)
	at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1679)
	at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2464)
	at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2358)
	at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2196)
	at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1679)
	at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:493)
	at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:451)
	at java.base/java.util.ArrayList.readObject(ArrayList.java:929)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at java.base/java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1175)
	at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2325)
	at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2196)
	at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1679)
	at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:493)
	at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:451)
	at java.base/java.util.HashMap.readObject(HashMap.java:1460)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at java.base/java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1175)
	at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2325)
	at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2196)
	at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1679)
	at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2464)
	at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2358)
	at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2196)
	at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1679)
	at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2464)
	at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2358)
	at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2196)
	at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1679)
	at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:493)
	at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:451)
..
Caused by: java.lang.RuntimeException: org.nd4j.shade.jackson.databind.exc.InvalidDefinitionException: Cannot construct instance of `com.linkedin.dagli.dl4j.ReshapeMasklessVertex` (no Creators, like default constructor, exist): cannot deserialize from Object value (no delegate- or property-based Creator)
 at [Source: (String)"{
  "backpropType" : "Standard",
  "cacheMode" : "NONE",
  "dataType" : "FLOAT",
  "defaultConfiguration" : {
    "cacheMode" : "NONE",
    "dataType" : "FLOAT",
    "epochCount" : 0,
    "iterationCount" : 0,
    "layer" : null,
    "maxNumLineSearchIterations" : 5,
    "miniBatch" : true,
    "minimize" : true,
    "optimizationAlgo" : "STOCHASTIC_GRADIENT_DESCENT",
    "seed" : 1,
    "stepFunction" : null,
    "variables" : [ "NNLSTMLayer-0_W", "NNLSTMLayer-0_RW", "NNLSTMLayer-0_b", "NNDense"[truncated 13430 chars]; line: 49, column: 7] (through reference chain: org.deeplearning4j.nn.conf.ComputationGraphConfiguration["vertices"]->java.util.LinkedHashMap["NNSplitVectorSequenceLayer-0"])
	at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:196)
	at org.deeplearning4j.util.ModelSerializer.restoreComputationGraphHelper(ModelSerializer.java:547)
	... 50 more
Caused by: org.nd4j.shade.jackson.databind.exc.InvalidDefinitionException: Cannot construct instance of `com.linkedin.dagli.dl4j.ReshapeMasklessVertex` (no Creators, like default constructor, exist): cannot deserialize from Object value (no delegate- or property-based Creator)
 at [Source: (String)"{
  "backpropType" : "Standard",
  "cacheMode" : "NONE",
  "dataType" : "FLOAT",
  "defaultConfiguration" : {
    "cacheMode" : "NONE",
    "dataType" : "FLOAT",
    "epochCount" : 0,
    "iterationCount" : 0,
    "layer" : null,
    "maxNumLineSearchIterations" : 5,
    "miniBatch" : true,
    "minimize" : true,
    "optimizationAlgo" : "STOCHASTIC_GRADIENT_DESCENT",
    "seed" : 1,
    "stepFunction" : null,
    "variables" : [ "NNLSTMLayer-0_W", "NNLSTMLayer-0_RW", "NNLSTMLayer-0_b", "NNDense"[truncated 13430 chars]; line: 49, column: 7] (through reference chain: org.deeplearning4j.nn.conf.ComputationGraphConfiguration["vertices"]->java.util.LinkedHashMap["NNSplitVectorSequenceLayer-0"])
	at org.nd4j.shade.jackson.databind.exc.InvalidDefinitionException.from(InvalidDefinitionException.java:67)
	at org.nd4j.shade.jackson.databind.DeserializationContext.reportBadDefinition(DeserializationContext.java:1764)
	at org.nd4j.shade.jackson.databind.DatabindContext.reportBadDefinition(DatabindContext.java:400)
	at org.nd4j.shade.jackson.databind.DeserializationContext.handleMissingInstantiator(DeserializationContext.java:1209)
	at org.nd4j.shade.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1415)
	at org.nd4j.shade.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:362)
	at org.nd4j.shade.jackson.databind.deser.BeanDeserializer._deserializeOther(BeanDeserializer.java:230)
	at org.nd4j.shade.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:197)
	at org.nd4j.shade.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId(AsPropertyTypeDeserializer.java:137)
	at org.nd4j.shade.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:107)
	at org.nd4j.shade.jackson.databind.deser.AbstractDeserializer.deserializeWithType(AbstractDeserializer.java:263)
	at org.nd4j.shade.jackson.databind.deser.std.MapDeserializer._readAndBindStringKeyMap(MapDeserializer.java:611)
	at org.nd4j.shade.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:437)
	at org.nd4j.shade.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:32)
	at org.nd4j.shade.jackson.databind.deser.impl.MethodProperty.deserializeAndSet(MethodProperty.java:129)
	at org.nd4j.shade.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:324)
	at org.nd4j.shade.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:187)
	at org.deeplearning4j.nn.conf.serde.ComputationGraphConfigurationDeserializer.deserialize(ComputationGraphConfigurationDeserializer.java:61)
	at org.deeplearning4j.nn.conf.serde.ComputationGraphConfigurationDeserializer.deserialize(ComputationGraphConfigurationDeserializer.java:51)
	at org.nd4j.shade.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:322)
	at org.nd4j.shade.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4593)
	at org.nd4j.shade.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3548)
	at org.nd4j.shade.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3516)
	at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.fromJson(ComputationGraphConfiguration.java:167)
	... 51 more

cyberbeat avatar Aug 24 '21 08:08 cyberbeat

  1. WRT NNLastVectorInSequence, this may be a masking issue due to NNSplitVectorSequenceLayer not preserving the masking (the Javadoc for NNSplitVectorSequenceLayer mentions an exception as a possible result), although it's not entirely clear to me from your code excerpt whether this should be an issue (it looks like the sequences are fixed-size). However, you don't need NNSplitVectorSequenceLayer--you can provide your input vectors to NNLSTMLayer directly with, e.g. the withInputFromVectorSequence(...) method.
  2. Stepping back for a second, with a small, fixed-size window of radius 5 as you have here, I'd suggest considering passing the context into the network as a (flat) vector of features rather than a sequence of vectors (i.e. feeding them into a perceptron layer). Also, doing a pass across the entire window with the LSTM is not ideal because it gives a chance for the recurrent NN to "forget" the middle of the range closest to your node of interest; if you really wanted to go the LSTM route you'd actually want to have two LSTM layers separately covering the "before" nodes and the "after" nodes.
  3. The java.io.FileNotFoundException may be a DL4J layer, although it may be ultimately caused by the issue deserializing Dagli DL4J vertices (per the next item).
  4. The deserialization error is a bug in Dagli's DL4J vertex implementation. It looks like we failed to include default constructors to permit for Java deserialization; I'm not sure why our unit tests didn't catch this (we'll have to follow up on that separately) but it's an easy fix. We'll try to push an update by the end of the day.

jeffpasternack avatar Aug 24 '21 20:08 jeffpasternack

  1. But the NNLSTMLayer does not have these nice Aggregated methods (input().concatenated()...) like fromCategoricalValues, fromNumbers,.. (this is one of the strong points of dagli) how could I feed such an Aggregated Input to it's "withInputFromVectorSequence" method?
  2. Thanks, I'll try to do some more experiments, perhaps also with attention, but I start simple.
  3. and
  4. Thanks for the update. "nd4j-op-def.pbtxt" is still an issue. Now, when trying to deserialize/inference I get:
Exception in thread "main" java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.IllegalStateException: java.lang.IllegalStateException: Requested output variable null does not exist in SameDiff instance
	at com.linkedin.dagli.dag.FastPreparedDAGExecutor.executeUnsafeImpl(FastPreparedDAGExecutor.java:164)
	at com.linkedin.dagli.dag.FastPreparedDAGExecutor.applyUnsafeImpl(FastPreparedDAGExecutor.java:116)
	at com.linkedin.dagli.dag.LocalDAGExecutor.applyUnsafeImpl(LocalDAGExecutor.java:77)
	at com.linkedin.dagli.dag.AbstractDAGExecutor.applyUnsafe(AbstractDAGExecutor.java:77)
	at com.linkedin.dagli.dag.DAG1x1$Prepared.applyAll(DAG1x1.java:438)
..
Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException: java.lang.IllegalStateException: Requested output variable null does not exist in SameDiff instance
	at java.base/java.util.concurrent.ForkJoinTask.get(ForkJoinTask.java:1006)
	at com.linkedin.dagli.dag.FastPreparedDAGExecutor.executeUnsafeImpl(FastPreparedDAGExecutor.java:157)
	... 7 more
Caused by: java.lang.IllegalStateException: java.lang.IllegalStateException: Requested output variable null does not exist in SameDiff instance
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
	at java.base/java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:600)
	... 9 more
Caused by: java.lang.IllegalStateException: Requested output variable null does not exist in SameDiff instance
	at org.nd4j.common.base.Preconditions.throwStateEx(Preconditions.java:638)
	at org.nd4j.common.base.Preconditions.checkState(Preconditions.java:301)
	at org.nd4j.autodiff.samediff.internal.AbstractSession.output(AbstractSession.java:136)
	at org.nd4j.autodiff.samediff.SameDiff.directExecHelper(SameDiff.java:2583)
	at org.nd4j.autodiff.samediff.SameDiff.batchOutputHelper(SameDiff.java:2551)
	at org.nd4j.autodiff.samediff.SameDiff.output(SameDiff.java:2526)
	at org.nd4j.autodiff.samediff.config.BatchOutputConfig.output(BatchOutputConfig.java:142)
	at org.nd4j.autodiff.samediff.config.BatchOutputConfig.exec(BatchOutputConfig.java:135)
	at org.nd4j.autodiff.samediff.config.BatchOutputConfig.outputSingle(BatchOutputConfig.java:161)
	at org.nd4j.autodiff.samediff.SameDiff.outputSingle(SameDiff.java:2490)
	at org.deeplearning4j.nn.layers.samediff.SameDiffGraphVertex.doForward(SameDiffGraphVertex.java:136)
	at org.deeplearning4j.nn.graph.ComputationGraph.outputOfLayersDetached(ComputationGraph.java:2438)
	at org.deeplearning4j.nn.graph.ComputationGraph.output(ComputationGraph.java:1870)
	at com.linkedin.dagli.dl4j.AbstractCustomNeuralNetwork$AbstractPrepared.apply(AbstractCustomNeuralNetwork.java:922)
	at com.linkedin.dagli.dl4j.AbstractCustomNeuralNetwork$AbstractPrepared.apply(AbstractCustomNeuralNetwork.java:911)
	at com.linkedin.dagli.dl4j.AbstractCustomNeuralNetwork$AbstractPrepared.applyAll(AbstractCustomNeuralNetwork.java:960)
	at com.linkedin.dagli.dl4j.AbstractCustomNeuralNetwork$AbstractPrepared.applyAll(AbstractCustomNeuralNetwork.java:798)
	at com.linkedin.dagli.transformer.AbstractPreparedStatefulTransformerDynamic$InternalAPI.applyAllUnsafe(AbstractPreparedStatefulTransformerDynamic.java:90)
	at com.linkedin.dagli.dag.FastPreparedDAGExecutor.apply(FastPreparedDAGExecutor.java:219)
	at com.linkedin.dagli.dag.FastPreparedDAGExecutor.executeUnsafeImplThread(FastPreparedDAGExecutor.java:193)
	at com.linkedin.dagli.dag.FastPreparedDAGExecutor.lambda$executeUnsafeImpl$6(FastPreparedDAGExecutor.java:144)
	at java.base/java.util.concurrent.ForkJoinTask$AdaptedCallable.exec(ForkJoinTask.java:1448)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)

cyberbeat avatar Aug 26 '21 20:08 cyberbeat

  1. withInputFromVectorSequence(...) can be passed anything that produces something of type Iterable<? extends Vector>, e.g. a list of Vectors (the Dagli math kind of vector, not the old-school Java collection type of vector). You can get something to produce this using existing transformers to create vectors with your per-node features and then use, e.g. a VariadicList list transformer to combine the vectors into a list. Or you could create a custom transformer that constructed the list of vectors through whatever arbitrary featurization mechanism you'd want to code.
  2. FWIW, if you're feeding a context window directly to a perceptron layer as input features you won't be pooling and thus you probably won't benefit from attention.
  3. Unfortunately this is one of those exceptions that requires a debugger to understand. It could be a DL4J bug, or possibly a Dagli bug. If you can provide a working code example that we can run to replicate the exception we can investigate further, but otherwise you might have to remove whatever part of your NN seems to be offending DL4J.

jeffpasternack avatar Aug 27 '21 07:08 jeffpasternack

I have a simple test here, did I do something wrong?

@Struct("Test")
abstract class TestStruct {
	double _number = 0;
	boolean _label;
}

List<Test> examples = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9).stream().map(x -> new Test().withNumber(x)).collect(Collectors.toList());

Test.Placeholder p = new Test.Placeholder();

DenseLayerInput<NNSplitVectorSequenceLayer>.Aggregated aggregated = new NNSplitVectorSequenceLayer().withInput().concatenating();

aggregated = aggregated.fromNumbers(p.asNumber());
aggregated = aggregated.fromNumbers(p.asNumber());
aggregated = aggregated.fromNumbers(p.asNumber());
aggregated = aggregated.fromNumbers(p.asNumber());

NNSplitVectorSequenceLayer split = aggregated.done().withSplitSize(2);

NNLSTMLayer lstmLayer = new NNLSTMLayer()
.withInput(split)
.withBidirectionality(Bidirectionality.FORWARD_ONLY);

NNLastVectorInSequenceLayer poolingLayer = new NNLastVectorInSequenceLayer().withInput(lstmLayer);

NNDenseLayer denseLayers = new NNDenseLayer()
.withInput()
.concatenating()
.fromLayers(poolingLayer)
.done();

NNClassification<Boolean> labelClassification = new NNClassification<Boolean>()
.withFeaturesInput(denseLayers)
.withBinaryLabelInput(p.asLabel());

NeuralNetwork neuralNetwork = new NeuralNetwork()
.withLossLayers(labelClassification)
.withMaxEpochs(5);

DAG1x1<Test, DiscreteDistribution<Boolean>> dag = DAG.withPlaceholder(p).withOutput(neuralNetwork.asLayerOutput(labelClassification));
DAG1x1.Prepared<Test, DiscreteDistribution<Boolean>> prepared = dag.prepare(examples);

DAG1x1.Prepared.Result<DiscreteDistribution<Boolean>> result = prepared.applyAll(examples);

with Exception:

Exception in thread "main" java.lang.IllegalStateException: Topographical sort found a different number of layers than expected; either the neural network is malformed or there is a bug in Dagli
	at com.linkedin.dagli.nn.AbstractNeuralNetwork.topographicSort(AbstractNeuralNetwork.java:914)
	at com.linkedin.dagli.nn.AbstractNeuralNetwork.setLossLayers(AbstractNeuralNetwork.java:737)
	at com.linkedin.dagli.nn.AbstractNeuralNetwork.lambda$withLossLayers$22(AbstractNeuralNetwork.java:825)
	at com.linkedin.dagli.util.cloneable.AbstractCloneable.clone(AbstractCloneable.java:49)
	at com.linkedin.dagli.nn.AbstractNeuralNetwork.withLossLayers(AbstractNeuralNetwork.java:825)
...

I find this class "AbstractFeatureVectorInput" so useful, it would be great if all layers could use it some way, so I do not need to search for (or invent) the suitable transformers.

cyberbeat avatar Aug 30 '21 18:08 cyberbeat

Another test I don't understand:

List<Test> examples = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9).stream().map(x -> new Test().withNumber(x)).collect(Collectors.toList());

Test.Placeholder p = new Test.Placeholder();

List<DenseVectorFromNumbers> list = new ArrayList<>();
for (int i = 0; i < 10; i++) {
DenseVectorFromNumbers vec = new DenseVectorFromNumbers().withInputs(new Constant<Double>(0.0), new Constant<Double>(1.0));
list.add(vec);
}

VariadicList<DenseVector> v = new VariadicList<DenseVector>(list);

NNLSTMLayer lstmLayer = new NNLSTMLayer()
.withInputFromVectorSequence(v)
.withBidirectionality(Bidirectionality.FORWARD_ONLY);

NNLastVectorInSequenceLayer poolingLayer = new NNLastVectorInSequenceLayer().withInput(lstmLayer);

NNDenseLayer denseLayers = new NNDenseLayer()
.withInput()
.concatenating()
.fromLayers(poolingLayer)
.done();

NNClassification<Boolean> labelClassification = new NNClassification<Boolean>()
.withFeaturesInput(denseLayers)
.withBinaryLabelInput(p.asLabel());

NeuralNetwork neuralNetwork = new NeuralNetwork()
.withLossLayers(labelClassification)
.withMaxEpochs(5);

results in

Exception in thread "main" java.lang.RuntimeException: MultithreadedDAGExecutor terminated execution because it encountered an unexpected exception in a worker thread: org.deeplearning4j.exception.DL4JInvalidInputException: Received input with size(1) = 10 (input array shape = [9, 10, 2]); input.size(1) must match layer nIn size (nIn = 2)
	at com.linkedin.dagli.dag.MultithreadedDAGExecutor.executeUnsafe(MultithreadedDAGExecutor.java:1546)
	at com.linkedin.dagli.dag.MultithreadedDAGExecutor.prepareUnsafeImpl(MultithreadedDAGExecutor.java:1497)
	at com.linkedin.dagli.dag.LocalDAGExecutor.prepareUnsafeImpl(LocalDAGExecutor.java:71)
	at com.linkedin.dagli.dag.AbstractDAGExecutor.prepareUnsafe(AbstractDAGExecutor.java:99)
	at com.linkedin.dagli.dag.DAG1x1.prepare(DAG1x1.java:253)
...
Caused by: org.deeplearning4j.exception.DL4JInvalidInputException: Received input with size(1) = 10 (input array shape = [9, 10, 2]); input.size(1) must match layer nIn size (nIn = 2)
	at org.deeplearning4j.nn.layers.recurrent.LSTMHelpers.activateHelper(LSTMHelpers.java:171)
	at org.deeplearning4j.nn.layers.recurrent.LSTM.activateHelper(LSTM.java:145)
	at org.deeplearning4j.nn.layers.recurrent.LSTM.activate(LSTM.java:115)
	at org.deeplearning4j.nn.graph.vertex.impl.LayerVertex.doForward(LayerVertex.java:110)
	at org.deeplearning4j.nn.graph.ComputationGraph.ffToLayerActivationsInWS(ComputationGraph.java:2135)
	at org.deeplearning4j.nn.graph.ComputationGraph.computeGradientAndScore(ComputationGraph.java:1372)
	at org.deeplearning4j.nn.graph.ComputationGraph.computeGradientAndScore(ComputationGraph.java:1341)
	at org.deeplearning4j.optimize.solvers.BaseOptimizer.gradientAndScore(BaseOptimizer.java:174)
	at org.deeplearning4j.optimize.solvers.StochasticGradientDescent.optimize(StochasticGradientDescent.java:61)
	at org.deeplearning4j.optimize.Solver.optimize(Solver.java:52)
	at org.deeplearning4j.nn.graph.ComputationGraph.fitHelper(ComputationGraph.java:1165)
	at org.deeplearning4j.nn.graph.ComputationGraph.fit(ComputationGraph.java:1115)
	at org.deeplearning4j.nn.graph.ComputationGraph.fit(ComputationGraph.java:1082)
	at com.linkedin.dagli.dl4j.NeuralNetwork$Preparer.finish(NeuralNetwork.java:254)
	at com.linkedin.dagli.dl4j.NeuralNetwork$Preparer.finish(NeuralNetwork.java:50)
	at com.linkedin.dagli.nn.AbstractNeuralNetwork$Preparer.finishUnsafe(AbstractNeuralNetwork.java:1232)
	at com.linkedin.dagli.dag.MultithreadedDAGExecutor$PreparationFinishTask.onRun(MultithreadedDAGExecutor.java:792)
	at com.linkedin.dagli.dag.MultithreadedDAGExecutor$Task.run(MultithreadedDAGExecutor.java:368)
	at com.linkedin.dagli.dag.MultithreadedDAGExecutor$Scheduler.lambda$schedule$4(MultithreadedDAGExecutor.java:329)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

cyberbeat avatar Aug 30 '21 21:08 cyberbeat

The shape for LSTM seems to be wrong? Timestep should be at position 3 but is in position 2 from dagli?

cyberbeat avatar Aug 31 '21 20:08 cyberbeat

I tried to patch NetworkBuilderLayerVisitor like this:

	@Override
	public Void visit(NNLSTMLayer visited) {
		// the number of units is not always the same as the output width:
		long[] outputShape = _dynamicConfigs.get(visited).getOutputShape();
		long[] inputShape = _dynamicConfigs.get(visited.internalAPI().getInputLayer()).getOutputShape();
		_graphBuilder
				.addVertex(_layerNames.get(visited) + "-Reshaped", new ReshapeMasklessVertex(
						new long[] { -1, inputShape[1], inputShape[0] }), getParentNames(visited));
		int outputWidth = Math.toIntExact(outputShape[1]);
		int units = (visited.getBidirectionality() == Bidirectionality.CONCATENATED) ? outputWidth / 2 : outputWidth;

		LSTM layer = new LSTM.Builder()
				.activation(visited.getActivation().accept(ACTIVATION_CONVERTER))
				.gateActivationFunction(visited.getRecurrentActivation().accept(ACTIVATION_CONVERTER))
				.units(units)
				.nIn(inputShape[1])
				.dropOut(dropoutValue(visited.getDropoutProbability()))
				.build();

		_graphBuilder
				.addLayer(_layerNames.get(visited),
						visited.getBidirectionality() == Bidirectionality.FORWARD_ONLY ? layer
								: new Bidirectional(getBidirectionalityMode(visited.getBidirectionality()), layer),
						_layerNames.get(visited) + "-Reshaped");

		return null;
	}

This seems to work (training and inference in same run). But after serializing and deserializing, I get some dl4j exceptions.

cyberbeat avatar Sep 18 '21 21:09 cyberbeat

I fixed serialization of daglis ReshapeMasklessVertex by marking _newShape as JsonProperty

 @JsonProperty("newShape")
  private final long[] _newShape;

May this be a bug to be fixed?

cyberbeat avatar Sep 28 '21 19:09 cyberbeat

WRT ReshapeMasklessVertex, this should be fixed in the most recent (late August) Dagli JAR with the addition of a private constructor for the Java deserializer to use. Were you still seeing serialization errors using this most recent version?

WRT your aggregated.fromNumbers(...) example, thank you for providing this. I was able to (eventually) determine that the exception you were seeing was indeed a bug; in short, you're creating a corner case where multiple layers of the NN are identical (in the Java equals(...) sense) which was not originally anticipated. I have a solution in mind, but it will need to be tested before we ship it. In the meantime, your example will work if you write: NNSplitVectorSequenceLayer split = new NNSplitVectorSequenceLayer().withSplitSize(2).withInput().fromNumbers(p.asNumber(), p.asNumber(), p.asNumber(), p.asNumber()); (Instead of using the concatenating builder).

WRT your LSTM example with the shape/dimension exception, it does appear to be an issue with timestep and vector size being switched; I'll have to investigate further (thank you also for the suggested code patch).

jeffpasternack avatar Oct 01 '21 08:10 jeffpasternack

About serializing: I used beta8. I think the problem is, that private fields may not be serialized by default?

About aggregation: thanks for your investigation. I would like to add some other types than numbers, so the concatenating builder is really usefull. As I already wrote, this is really a great dagli-tool to handle data at a high level.

Here you can read more: https://blog.konduit.ai/2020/05/14/deeplearning4j-1-0-0-beta7-released/ The section about "NWC", default always was "NCW".

If I understand right, dagli submits wrong shape (NWC)? So perhaps that may be the reason, that your shakespear char-lstm examples don't deliver good results? As many other dl4j-layers also handle time series data as NCW, I would try to use NCW shape.

I know, this report gets very populated, but the following also belongs to the LSTM topic somehow:

I cannot use NNLastVectorInSequenceLayer when aggregating:

Dagli's DL4J network adapter does not currently support NNLastVectorInSequenceLayers that have an ancestor with multiple input layers; found such a layer... (Exception from NetworkBuilderLayerVisitor)

Can something done about that? Is this a problem for every such case?

cyberbeat avatar Oct 01 '21 15:10 cyberbeat

Serializing: standard Java serialization does serialize private fields. I'm unable to replicate any issue, but I'm also testing atop several bug-fix commits not in beta8--I can't see how they could be related, but it's not impossible. I suggest trying again once we ship beta9 in the next day or so (which will also include the fixes mentioned below.)

Aggregation: this is now fixed in a recent commit. I still strongly recommend using the fromNumbers(...) alternative syntax I suggested, since it's more concise and will work with NNLastVectorInSequenceLayer. Incidentally, we strongly recommend not using NNSplitVectorSequenceLayer if you can avoid it--instead, just feed the LSTM layer a list of vectors as an input. NNSplitVectorSequenceLayer has a non-obvious mapping from elements in the original vector to the elements in the vector sequences because it's implemented with a reshape operation.

LSTM: the underlying issue was that inputted vector sequences did indeed have erroneously swapped timestep and element index dimensions. This has been fixed in the most recent commit. The Shakespeare char-LSTM example actually has decent results (given that it's a hard problem)--it's not affected by this bug because it has a different architecture (using a sequence embedding table).

NNLastVectorInSequenceLayer: unfortunately, the way the DL4J implementation works is that it pulls masking information from one of the inputs to the NN. Having multiple input layer ancestors makes it impossible (without additional information) to identify the right input to use (or none may be right). The good news is that situations where you would really need multiple NN inputs feeding into a NNLastVectorInSequenceLayer are very rare. E.g. if you don't use a NNSplitVectorSequenceLayer with an in-NN concatenation of multiple vectors, this won't affect you (instead, as mentioned above, just create the sequence of vectors outside the NN and feed in that vector sequence as an input.)

jeffpasternack avatar Oct 05 '21 03:10 jeffpasternack