sempre Purpose of method "foreach" in class CallFormula

Hi, I'm trying to get some deeper insight in the code of sempre. For the moment, I'm looking at the simple Java logical forms and their execution by the class JavaExecutor- The class CallFormula features the method "forEach". I can't see its exact purpose. Intuitively, I think it manages the recursive decomposition of a more complex CallFormula, and I guess the Boolean returned by func says if the base case of the recursion is reached. But again, I don't get exactly how. Also it seems that this foreach method is not even used in the JavaExecutor's methods execute and processFormula, that apparently are responsible for the execution. Can someone help me to understand the logic behind all this please? Thanks!

Sep 06 '20 22:09 stbusch

Hi! The parent abstract class Formula defines a few abstract methods: toLispTree, forEach, map, and mapToList. These are utility functions for traversing Formula objects, which are tree structures. The method forEach in particular can be used to perform something recursively in the tree.

However, it looks like the forEach method is never used by anyone. The more general methods map and mapToList have been used a few times.

Sep 07 '20 02:09 ppasupat

Thanks for the explanations. However, I can't see yet where map and mapToList are used. My IDE (IntelliJ) actually says they aren't. My approach to understand the code now is going through the JavaExecutor's processFormula with an example CallFormula. I basically do get how the formula is recursively processed here but it's unclear to me which information is carried in the field Evaluation of the Response object that is returned by that method. Any hints? thanks!

Sep 10 '20 15:09 stbusch

The Executor can use the stats field (Evaluation object) to store arbitrary statistics about the execution (e.g., how much time it takes to execute). See freebase/SparqlExecutor.java (Line 253) for an example. I think JavaExecutor does not use this field. The statistics will be gathered across all executed formulas (Parser.java Line 356) and reported at the end of the run.

I found mapToList used in niche places like freebase/LexiconFn.java (Line 265) and overnight/OvernightFeatureComputer.java (Line 142). I think it's safe to ignore these utility methods for now.

Sep 11 '20 00:09 ppasupat

Thanks again.- I'm at the chapter Batch Learning of the tutorial now, trying to understand the corresponding parts of the sempre code. The classes FeatureExtractor and Rule refer to Rules being "anchored" or "floating". In your paper "Macro Grammars and Holistic Triggering for Efficient Semantic Parsing" you say

„..., in which logical forms are either triggered by specific phrases (anchored) or can be triggered in any context (floating).“

Does this mean an anchored rule is applicable only if a (sub)span of tokens matches the phrase of the rule's definition exactly (and not just a rhs category) - and any rule without that property is floating? Thanks

Sep 22 '20 17:09 stbusch

The anchored vs floating distinction is only applicable to the FloatingParser. Other parsers (e.g., BeamParser) assume that all rules are anchored.

A parser constructs Derivation objects representing complete or partial parses. The Rules in the grammar specify how larger Derivations can be constructed from smaller ones.
Each Derivation has a category string (e.g., $TOKEN, $Entity, or $ROOT) and a formula (a ***Formula object)
Each Derivation can be either anchored or floating. An anchored Derivation is associated with a token span (specified by start and end). A floating Derivation is not bound to a span (start and end are set as -1) but instead has a size (explained later below).
At the beginning of the parsing process, initial anchored Derivations are constructed from the input query. These are the "special categories" $TOKEN, $LEMMA_TOKEN, $PHRASE, and $LEMMA_PHRASE specified in the tutorial. For example, the query "state containing san francisco" would produce a Derivation of category = $PHRASE, formula = ValueFormula("san francisco"), start = 2, and end = 4 (among many other Derivations).
An anchored rule uses anchored Derivations and query strings as sources to produce an anchored Derivation. The sources must correspond to consecutive spans in the query. For example, (rule $foo (biggest $bar) (...) (anchored 1)) can only be applied if $bar is an anchored Derivation whose anchoring span appears right after the query token "biggest". If the $bar Derivation is anchored to span [i, j], and the token i-1 is "biggest", the newly constructed $foo would be anchored to span [i-1, j].
Anchored rules construct anchored Derivations with spans of increasing size, so eventually the process will end when all Derivations for span [0, query_length] are constructed. The parsing process ends here for non-FloatingParser, and the parser returns Derivations with category $ROOT.
For FloatingParser, all anchored Derivations are converted into floating Derivations of size 0. Floating rules now construct floating Derivations from floating Derivations. The sources no longer have to be consecutive spans, and the rule cannot specify raw query strings as sources. For example, (rule $foo ($bar $baz) (...)) combines any floating Derivations $bar and $baz into a $foo. Note that the (...) part describe how the formula of the $foo Derivation is built from the children formulas. If this construction fails (e.g., when the formulas do not type check), the parent Derivation is not constructed.
Floating rules can combine any floating Derivations from anywhere forever, so we need an extra stopping criterion: size. The size field of a floating Derivation counts the number of rules used to construct the Derivation. The parsing process stops at a specified Derivation size.

Sep 25 '20 02:09 ppasupat

Thanks for your elaborate explanation! Now I'm trying to track from the main method how sempre is setup for interaction. Apparently the fig/.... supporting packages are used pretty much, but these are not part of sempre and I'm not sure which parts of these are relevant in the particular context of sempre. The main method indirectly calls the static method Execution.runWithObjArray(String [] args, Object [] objects );, with objects containing ("Main", new Main(), Master.getOptionsParser()). This method starts with the call init(args, objects);, which for its part calls the register method of the OptionsParser class. Here, the interface OptionSet is used the get the annotations of the fields of the objects in the Array objects - but it seems in sempre this interface (or a class implementing it) is never used. Is this correct, and if so, can I neglect this register method in the process of understanding how the setup of sempre for interaction is implemented? Thanks again.

Sep 30 '20 15:09 stbusch

In practice, the fields in Main that are annotated with @Option will be registered. The Option annotation (from fig.basic) and OptionSet don't interact, so I'm not sure why register successfully parses these Option fields either.
Note that in Master.getOptionsParser, the OptionParser also registers options from the classes specified in the module-classes.txt file (generated during ant compilation). AFAIK this was a hack so that the options of irrelevant packages do not have to be parsed (e.g., the tables.* classes will not be processed if tables ant target was unused).
- In this method, the @Option-annotated fields under the opts object (defined by public static Options opts = new Options(); in most classes) will be registered.

Oct 01 '20 07:10 ppasupat

Thanks again. The background of all my questions that I would like to create a plugin for an existing NLP java application that enhances it with sempre functionality. That requires some deeper understanding of the code. As I see, the real entry point to sempre is not the main method, but the Ruby script run in combination with execrunner. I am completely new to Ruby and I wonder if a complete understanding of these scripts is necessary for my goal. Are they basically "just" setting up the environment to pass parameters to the Java code from the command line (so that understanding the Java code would probably be sufficient to me)? I have established that run basically defines the available modes and then adds them to the global variable $modes, and maps the modes to their functionality in modesMap, based on the information stored in $modes. The final command run!(sel(:mode, modesMap)) most likely leads to the environment being set up according to the chosen mode, though I failed to track that in detail so far. So, the question is: Do I need to understand the Ruby scripts in detail? (learning Ruby is certainly worthwhile, but time pressure forces me to set priorities). Thanks!

Oct 15 '20 00:10 stbusch

You might not need to understand the ruby script in detail. The ruby script is just a tool for creating a Java command (or multiple commands) to run. From any ruby command in README or TUTORIAL, you can add -n to print the Java command and exit. That Java command can be invoked directly without using the ruby script.

Oct 17 '20 22:10 ppasupat

Thanks again. That will save me some time. Now I'm back at the Batch Learning chapter of the tutorial. There it says

The rule feature domain tells the feature extractor to increment the feature each time the grammar rule is applied in the derivation.

Could you please tell me if I understand the following points correctly:

"Derivation" does here not refer to a single Rule application but to the whole derivation-tree for each of the two possible formulars
the dimension of the resulting Feature Vector equals the number of rules in the grammar
each of its components corresponds to (another) one of the rules and shows how frequently it was used in the represented derivation tree ? I figured out that the method extractRuleFeatures in class FeatureExtractor must play a main role here but I don't see yet how it counts all rule applications in the derivation tree. Thanks again!

Oct 24 '20 17:10 stbusch

Yes, "Derivation" refers to the whole derivation tree. The rule features of a derivation tree are the rules used when building that tree. (the feature names = the rules serialized as strings).
Yes, the dimension is the number of rules, if only rule features are present. The feature vector is implemented as a Map-like object (from feature names to feature values), so the size can be increased when new features are discovered.
Yes, for a rule feature R, the feature value is the frequency of the rule R in the derivation tree.

Regarding rule computation: while one could freshly compute the rule features for each new derivation, the code does it recursively: to compute the rule features for derivation D, combine the rule features of the child derivations of D, then add the rule R applied when constructing D. The localFeatureVector field of D only stores this final rule feature R. During scoring or gradient updates, the features will be gathered recursively.

Oct 25 '20 23:10 ppasupat

Thanks again, ppasupat. While I get the idea, I can't seem to find where this recursive gathering of features and calculating of scores is actually located in the code. The class FeatureExtractor contains the method extractLocal which apparently calls the respective extraction method for each activated feature. The comment here says: This function is called on every sub-Derivation,..., but where does that happen? I found that extractLocal is used by the ParserState class and its subclasses within the method featurizeAndScoreDerivation, but I don't see the recursion that takes the whole derivation tree into account. - The method computeScore in the class Derivation does recursively calculate all scores, but why is it used only the ReinforcementParser? - I would like to gain some more insight here. Thanks!

Oct 29 '20 18:10 stbusch

Background

The method parse in Parser is responsible for parsing an utterance. It calls ParserState state = newParserState(params, ex, computeExpectedCounts) and then state.infer() to construct the Derivations. These two methods should be overridden in subclasses of Parser. The state.infer() call should populate the predDerivations field of state with the final Derivations for the example.

(Note: computeExpectedCounts means whether to compute the information necessary for gradient updates. The flag is turned on during training and off during evaluation.)

Let's use BeamParser as an example. Ignore the coarse{State,Prune} stuff (and whenever BeamParserState.mode == bool) which is only for speeding up parsing.

The newParserState method constructs a new BeamParserState object (which is in the same file BeamParser.java). The infer method of BeamParserState constructs Derivations over utterance spans of increasing sizes using the build method, and then call setPredDerivation (defined in the parent class ChartParserState) to populate the predDerivations field.

Tracing the method calls from build leads to the applyRule(start, end, rule, children) method, which constructs Derivations from the given children using the given grammar rule. (Ignore the mode == bool section.) The produced derivations newDeriv are featurized and scored by featurizeAndScoreDerivation (defined in ParserState). This method calls extractLocal to extract features and then computeScoreLocal to compute the score.

Feature extraction and scoring

Feature extraction and scoring are actually not done recursively (sorry for my confusion earlier). Rather, it uses dynamic programming. Let's say a derivation A has one child B, A has local features {a1, a2}, and the total score of all features in B (including the features of its descendants) is already computed and stored in B.score. Then the total score of A, A.score, can be computed as A.score = params.getWeight(a1) + params.getWeight(a2) + B.score.

The extractLocal method (in FeatureExtractor) extracts features only for the current derivation and not the children. For example, the "rule" feature (in extractRuleFeatures) will define only one feature for the rule used to build the derivation. The rules used for building the children should have already been extracted when the children were constructed (and featurizeAndScoreDerivation was called on them).

The computeScoreLocal method (in Derivation) computes score (total score of all features) = total score for local features + sum of the score from children. Like the features, the score field of the children should already have been populated when the children were constructed.

In ReinforcementParser, there is no guarantee that the score of children derivations have been populated before the parent is constructed (I think), so the computeScore is used instead of computeScoreLocal to recursively score the children.

Note that this dynamic programming method only works for linear models (which SEMPRE uses). A model that consider interactions between features would not work.

Nov 01 '20 05:11 ppasupat

Thank you! With your explanations and further studying of the code I now believe to have understood the following:

the field chart of the class (Beam)ChartParserState keeps track of the so far detected derivations. It does so by maintaining a two dimensional map. The two indices stand for start and end of a (sub)span. The Key is the name of a (target) category. The Value is a list of those derivations that have been found for that particular span with that particular category.
The infer method of BeamParserState first finds the "primitive" derivations, that are based only on Tokens and Phrases. To achieve this, gatherTokenAndPhraseDerivations is called for each subspan of the given span. The results are added to chart.
infer then calls build on subspans of increasing length, respectively for all possible start indices within the span, from left to right. For each of these subspans, and in their thus determined order, build calls applyCatUnaryRules and applyNonCatUnaryRules. These try to apply all available rules to the respective subspan. Everytime a rule is applicable to existing dervations in chart (which is, in case of applyCatUnaryRules, because its rhs matches the category these derivations are mapped to in chart for the respective subspan), applyRule is called. This method executes the rule that was found applicable, creates a new derivation and puts that one in chart.
(Ok, for applyNonCatUnaryRules it is a bit more complex, but I think I got the principle how the most important methods interact...). So for each subspan build is called on, the derivations created on previous subspans can be used to potentially apply rules on.
applyRule also extracts the features of the newly added derivation. It then computes the "total" score of that derivation as the sum of its weighted local features and the scores of its children.
finally, infer calls setPredDerivations. This puts alls derivations that go over the whole original span and have category $Root (and so qualify to be returned to the user) into the field predDerivations of BeamParserState.

Please correct me if I'm wrong.

Nov 03 '20 02:11 stbusch

Yes, these are all correct. Different parsers might construct logical forms in different orders (e.g., the second stage of FloatingParser does not care about start and end; the ReinforcementParser learns to pick which partial Derivation to expand on; etc.), but the general ideas about applying rules, scoring, and then collecting predDerivations should be the same.

Nov 03 '20 02:11 ppasupat

Thanks. Next thing will be to investigate the implementation of the math behind it, the actual calculation of parameters and probabilities.

Nov 07 '20 16:11 stbusch

So I'm trying to get to the core of the math behind the lerning process. I've brushed up my knowledge on Stochastic Gradient Descent, but have problems mapping it to the code. The main method here is apparently learn in the Learner class. It calls processExamples, which uses parseExample, checkGradient and updateWeights (which calls update in class Params). The names suggest that the actual updating happens in these last two mentioned. But in the checkGradient method there is the line

perturbedParams.getWeights().put(feature, perturbedParams.getWeight(feature) + eps);

which looks like the weights are updated here (but with the constant eps and not with a calculated SGD step). Also, what part of the SGD algorithm do the fields objectiveValue and expectedCounts of the ParserState class correspond to, that are used by checkGradient ? Could you please provide some hints about the ML implementation? Thanks

Nov 09 '20 17:11 stbusch

Objective function and gradient

A derivation has score score(deriv) = sum_{feature f} params.weights[f] * deriv.feature_count[f]
- This is computed by Derivation.computeScore(params)
Define p(deriv) = exp[score(deriv)] / sum_deriv' exp[score(deriv')]
The objective function is the log-likelihood of the derivations with correct denotations
- The likelihood is sum p(deriv d with correct deno) = sum_{d with correct deno} exp[score(d)] / sum_{all derivs d} exp[score(d)]
- The log-likelihood is log(sum_{d with correct deno} exp[score(d)]) - log(sum_{all derivs d} exp[score(d)])
The "expected count" of f is the derivative of the objective function with respect to params.weights[f]
- This involves a bit of algebra.
To maximize the objective function, we add (expected count of f * learning rate) to the weight of f

How all this is done in the code

Computing expected counts is done in ParserState.computeExpectedCounts
Now in Learner.processExamples:
- computeExpectedCounts (boolean) indicates whether the weights should be updated
- The update is done in batches: The expectedCounts in the ParserState are gathered in the counts map until a certain number of examples (specified by batchSize) is processed. Then updateWeights(counts) is called on the accumulated expected counts (after which counts is reset).
- updateWeights(counts) essentially calls params.update(counts). Ignoring L1-regularization, the function just does MapUtils.incr(weights, f, stepSize * g);. This means adding (expected count of f * learning rate) to the weight of f.
You can ignore the checkGradient part. It is used to check if the gradient update is correct. It does not actually update the weights.

Nov 10 '20 04:11 ppasupat

A question about how to tell a FeatureExtractor which features it is supposed to use: I understand these features are specified in the command line using -FeatureExtractor.featureDomains...... . But where in the code are these actually added to the HashSet featureDomains in the class Options of the FeatureExtractor ?

Dec 22 '20 13:12 stbusch

Is it save to say, that for parsing relatively simple utterances into logical forms that are meant to be executed by JavaExecutor mostly the BeamParser Class and the Rule Feature are relevant?

Feb 03 '21 10:02 stbusch

Using the BeamParser should be sufficient for the parser. For the features, the rule features alone might be insufficient (since it only counts how many times each rule is used, regardless of the input utterance). The geo880 mode (for Geoquery dataset) lists the following features:

rule opCount constant whType span lemmaAndBinaries denotation lexAlign joinPos skipPos

(Line 1093 in the run script). This might be a good set of features to try (though some of which are not guaranteed to work correctly based on the grammar and executor used).

Feb 03 '21 10:02 ppasupat

Thanks for the hint. I was asking because I'd like to offer the user a reasonable selection of Features to choose from via checkboxes in my GUI. I'd also like to add a short description to of each Feature to the GUI. The "constant" Feature seems to be located in the class ConstantFn directly. Am I right that the code segment

          res.addFeature("constant", ex.phraseString(c.getStart(), c.getEnd()) + " --- " + formula.toString());

registers appliance of ConstantFn per subspan? Also, which part of the code does actually trigger that registration? It seems that there is no method in FeatureExtractor taking care of the "constant" Feature.

Feb 05 '21 15:02 stbusch

Feature firing in SEMPRE is pretty complicated due to legacy reasons.

This docstring of FeatureExtractor gives an overview of the 3 ways features can be defined.
Feature domains can be turned on or off based on the flag FeatureExtractor.featureDomains.
Features defined in a SemanticFn (like the example you gave for ConstantFn) will be defined once the SemanticFn is called (i.e., when building the Derivation).
- In the case of ConstantFn, the feature is defined on every subspan the corresponding rule fires. For example, (rule $TypeNP (meeting) (ConstantFn en.meeting)) detects the word "meeting" and create a Derivation of category $TypeNP with formula en.meeting. The res.addFeature(...) line you mentioned will register a "constant" feature during the creation of the Derivation (res).
Other features will be defined when ParserState.featurizeAndScoreDerivation(deriv) is called. The FeatureExtractor goes through all features defined locally in the FeatureExtractor plus any registered FeatureComputer classes.

Feb 07 '21 03:02 ppasupat

I'm back at investigating the features. whType is using Part-of-Speech tagging. As far as I managed to trace it, the method preprocess in the Example class populates the field posTags inn the field languageInfo by applying a LanguageAnalyzer. I'd like to know how it is determined which POS tags are actually added here and if it is possible to change (add/remove) POS tags manually? thanks!

Mar 21 '21 02:03 stbusch

The types of POS tags depend on the LanguageAnalyzer used (specified by the flag --LanguageAnalyzer.languageAnalyzer). The available choices are SimpleAnalyzer (which is a dummy analyzer that only recognizes numbers) and corenlp.CoreNLPAnalyzer (which runs CoreNLP).

The POS tags returned by CoreNLP are the based on the model used, with the default using Penn Treebank tags. The class CoreNLPAnalyzer postprocesses some of the tags (e.g., marking auxiliary verbs), and more postprocessing could be hacked in there.

Mar 21 '21 05:03 ppasupat

Thanks. I will think over if I need this feature. It's probably more relevant for database queries than for the cases the JavaExecutor handles.

Mar 21 '21 12:03 stbusch

Shouldn't it be possible to map Strings to categories, e.g.

(rule $City (string "hamburg") (IdentityFn))
(rule $City (string "stuttgart") (IdentityFn))
(rule $City (string "munic") (IdentityFn))
(rule $ROOT ($City) (IdentityFn))

as a part/base of a more complex grammar?

In my application, using these rules, these city names aren't recognized. Is the grammar itself wrong? Otherwise it would mean, my application doesn't handle the grammar file correctly. Thanks

Mar 29 '21 21:03 stbusch

Could you try changing the rules to look like:

(rule $City (hamburg) (IdentityFn))

Mar 31 '21 00:03 ppasupat

I had tried this; I then get:

   ERROR: Composition failed: rule = $City -> hamburg (IdentityFn), children = []
java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-374" java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException
        at edu.stanford.nlp.sempre.BeamParserState.applyRule(BeamParser.java:165)
        at edu.stanford.nlp.sempre.BeamParserState.applyNonCatUnaryRules(BeamParser.java:224)
        at edu.stanford.nlp.sempre.BeamParserState.applyNonCatUnaryRules(BeamParser.java:231)
        at edu.stanford.nlp.sempre.BeamParserState.build(BeamParser.java:123)
        at edu.stanford.nlp.sempre.BeamParserState.infer(BeamParser.java:98)
        at edu.stanford.nlp.sempre.Parser.parse(Parser.java:170)....

(rest of the error message is referring to classes in my application) What apparently works is: (rule $City (hamburg) (ConstantFn (string hamburg))) but that feels quite cumbersome.

Mar 31 '21 02:03 stbusch

The current grammar looks like this:

(rule $Dir (from) (ConstantFn (lambda x (call + (string "place of departure") (string :)(var x)))))
(rule $From ($Dir $City) (JoinFn forward))

(rule $Dir (to) (ConstantFn (lambda x (call + (string "arrival location") (string :)(var x)))))
(rule $To ($Dir $City) (JoinFn forward))

(rule $Part (by) (ConstantFn (lambda x (call + (string "transportation") (string :)(var x)))))
(rule $By ($Part $Transport) (JoinFn forward))

(rule $Transport (plane) (ConstantFn (string plane)))
(rule $Transport (train) (ConstantFn (string train)))

(rule $City (hamburg) (ConstantFn (string hamburg)))
(rule $City (stuttgart) (ConstantFn (string stuttgart)))
(rule $City (frankfurt) (ConstantFn (string frankfurt)))
(rule $City (munic) (ConstantFn (string munic)))
(rule $City (hannover) (ConstantFn (string hannover)))

(rule $ROOT ($From) (IdentityFn))
(rule $ROOT ($To) (IdentityFn))
(rule $ROOT ($By) (IdentityFn))

and manages to parse e.g. "from munic" into "(string "place of departure:munic")

though it for some reason returns the same derivation tree twice here - , but I haven't found a way to parse a combination of the categories $From, $To and $By. Adding the rule (rule $ROOT ($From $By) (ConcatFn "")) and entering "from munic by train" returned (string nullnull). Is there a way? thanks!

Apr 01 '21 01:04 stbusch

sempre sempre copied to clipboard

Purpose of method "foreach" in class CallFormula

sempre
sempre copied to clipboard