sempre
sempre copied to clipboard
Purpose of method "foreach" in class CallFormula
Hi, I'm trying to get some deeper insight in the code of sempre. For the moment, I'm looking at the simple Java logical forms and their execution by the class JavaExecutor- The class CallFormula features the method "forEach". I can't see its exact purpose. Intuitively, I think it manages the recursive decomposition of a more complex CallFormula, and I guess the Boolean returned by func says if the base case of the recursion is reached. But again, I don't get exactly how. Also it seems that this foreach method is not even used in the JavaExecutor's methods execute and processFormula, that apparently are responsible for the execution. Can someone help me to understand the logic behind all this please? Thanks!
Hi! The parent abstract class Formula
defines a few abstract methods: toLispTree
, forEach
, map
, and mapToList
. These are utility functions for traversing Formula
objects, which are tree structures. The method forEach
in particular can be used to perform something recursively in the tree.
However, it looks like the forEach
method is never used by anyone. The more general methods map
and mapToList
have been used a few times.
Thanks for the explanations. However, I can't see yet where map
and mapToList
are used. My IDE (IntelliJ) actually says they aren't.
My approach to understand the code now is going through the JavaExecutor's processFormula
with an example CallFormula. I basically do get how the formula is recursively processed here but it's unclear to me which information is carried in the field Evaluation
of the Response
object that is returned by that method. Any hints? thanks!
The Executor can use the stats
field (Evaluation
object) to store arbitrary statistics about the execution (e.g., how much time it takes to execute). See freebase/SparqlExecutor.java
(Line 253) for an example. I think JavaExecutor does not use this field. The statistics will be gathered across all executed formulas (Parser.java
Line 356) and reported at the end of the run.
I found mapToList
used in niche places like freebase/LexiconFn.java
(Line 265) and overnight/OvernightFeatureComputer.java
(Line 142). I think it's safe to ignore these utility methods for now.
Thanks again.- I'm at the chapter Batch Learning of the tutorial now, trying to understand the corresponding parts of the sempre code. The classes FeatureExtractor
and Rule
refer to Rules being "anchored" or "floating". In your paper "Macro Grammars and Holistic Triggering for Efficient Semantic Parsing" you say
„..., in which logical forms are either triggered by specific phrases (anchored) or can be triggered in any context (floating).“
Does this mean an anchored rule is applicable only if a (sub)span of tokens matches the phrase of the rule's definition exactly (and not just a rhs category) - and any rule without that property is floating? Thanks
The anchored vs floating distinction is only applicable to the FloatingParser. Other parsers (e.g., BeamParser) assume that all rules are anchored.
- A parser constructs
Derivation
objects representing complete or partial parses. TheRule
s in the grammar specify how larger Derivations can be constructed from smaller ones. - Each Derivation has a
cat
egory string (e.g., $TOKEN, $Entity, or $ROOT) and aformula
(a***Formula
object) - Each Derivation can be either anchored or floating. An anchored Derivation is associated with a token span (specified by
start
andend
). A floating Derivation is not bound to a span (start
andend
are set as -1) but instead has asize
(explained later below). - At the beginning of the parsing process, initial anchored Derivations are constructed from the input query. These are the "special categories" $TOKEN, $LEMMA_TOKEN, $PHRASE, and $LEMMA_PHRASE specified in the tutorial. For example, the query "state containing san francisco" would produce a Derivation of category = $PHRASE, formula = ValueFormula("san francisco"), start = 2, and end = 4 (among many other Derivations).
- An anchored rule uses anchored Derivations and query strings as sources to produce an anchored Derivation. The sources must correspond to consecutive spans in the query. For example,
(rule $foo (biggest $bar) (...) (anchored 1))
can only be applied if $bar is an anchored Derivation whose anchoring span appears right after the query token "biggest". If the $bar Derivation is anchored to span [i, j], and the token i-1 is "biggest", the newly constructed $foo would be anchored to span [i-1, j]. - Anchored rules construct anchored Derivations with spans of increasing size, so eventually the process will end when all Derivations for span [0, query_length] are constructed. The parsing process ends here for non-FloatingParser, and the parser returns Derivations with category $ROOT.
- For FloatingParser, all anchored Derivations are converted into floating Derivations of size 0. Floating rules now construct floating Derivations from floating Derivations. The sources no longer have to be consecutive spans, and the rule cannot specify raw query strings as sources. For example,
(rule $foo ($bar $baz) (...))
combines any floating Derivations $bar and $baz into a $foo. Note that the (...) part describe how the formula of the $foo Derivation is built from the children formulas. If this construction fails (e.g., when the formulas do not type check), the parent Derivation is not constructed. - Floating rules can combine any floating Derivations from anywhere forever, so we need an extra stopping criterion: size. The size field of a floating Derivation counts the number of rules used to construct the Derivation. The parsing process stops at a specified Derivation size.
Thanks for your elaborate explanation!
Now I'm trying to track from the main
method how sempre is setup for interaction.
Apparently the fig/.... supporting packages are used pretty much, but these are not part of sempre and I'm not sure which parts of these are relevant in the particular context of sempre. The main
method indirectly calls the static method Execution.runWithObjArray(String [] args, Object [] objects );
, with objects
containing
("Main", new Main(), Master.getOptionsParser())
. This method starts with the call init(args, objects);
, which for its part calls the register
method of the OptionsParser
class. Here, the interface OptionSet
is used the get the annotations of the fields of the objects in the Array objects
- but it seems in sempre this interface (or a class implementing it) is never used. Is this correct, and if so, can I neglect this register
method in the process of understanding how the setup of sempre for interaction is
implemented? Thanks again.
- In practice, the fields in
Main
that are annotated with@Option
will be registered. TheOption
annotation (fromfig.basic
) andOptionSet
don't interact, so I'm not sure whyregister
successfully parses theseOption
fields either. - Note that in
Master.getOptionsParser
, theOptionParser
also registers options from the classes specified in themodule-classes.txt
file (generated during ant compilation). AFAIK this was a hack so that the options of irrelevant packages do not have to be parsed (e.g., thetables.*
classes will not be processed iftables
ant target was unused).- In this method, the
@Option
-annotated fields under theopts
object (defined bypublic static Options opts = new Options();
in most classes) will be registered.
- In this method, the
Thanks again. The background of all my questions that I would like to create a plugin for an existing NLP java application that enhances it with sempre functionality. That requires some deeper understanding of the code. As I see, the real entry point to sempre is not the main
method, but the Ruby script run
in combination with execrunner
. I am completely new to Ruby and I wonder if a complete understanding of these scripts is necessary for my goal. Are they basically "just" setting up the environment to pass parameters to the Java code from the command line (so that understanding the Java code would probably be sufficient to me)? I have established that run
basically defines the available modes and then adds them to the global variable $modes
, and maps the modes to their functionality in modesMap
, based on the information stored in $modes
.
The final command run!(sel(:mode, modesMap))
most likely leads to the environment being set up according to the chosen mode, though I failed to track that in detail so far. So, the question is: Do I need to understand the Ruby scripts in detail? (learning Ruby is certainly worthwhile, but time pressure forces me to set priorities). Thanks!
You might not need to understand the ruby script in detail. The ruby script is just a tool for creating a Java command (or multiple commands) to run. From any ruby command in README or TUTORIAL, you can add -n
to print the Java command and exit. That Java command can be invoked directly without using the ruby script.
Thanks again. That will save me some time. Now I'm back at the Batch Learning chapter of the tutorial. There it says
The rule feature domain tells the feature extractor to increment the feature each time the grammar rule is applied in the derivation.
Could you please tell me if I understand the following points correctly:
- "Derivation" does here not refer to a single Rule application but to the whole derivation-tree for each of the two possible formulars
- the dimension of the resulting Feature Vector equals the number of rules in the grammar
- each of its components corresponds to (another) one of the rules and shows
how frequently it was used in the represented derivation tree
?
I figured out that the method
extractRuleFeatures
in classFeatureExtractor
must play a main role here but I don't see yet how it counts all rule applications in the derivation tree. Thanks again!
- Yes, "Derivation" refers to the whole derivation tree. The
rule
features of a derivation tree are the rules used when building that tree. (the feature names = the rules serialized as strings). - Yes, the dimension is the number of rules, if only
rule
features are present. The feature vector is implemented as a Map-like object (from feature names to feature values), so the size can be increased when new features are discovered. - Yes, for a rule feature R, the feature value is the frequency of the rule R in the derivation tree.
Regarding rule computation: while one could freshly compute the rule
features for each new derivation, the code does it recursively: to compute the rule
features for derivation D, combine the rule features of the child derivations of D, then add the rule R applied when constructing D. The localFeatureVector
field of D only stores this final rule feature R. During scoring or gradient updates, the features will be gathered recursively.
Thanks again, ppasupat. While I get the idea, I can't seem to find where this recursive gathering of features and calculating of scores is actually located in the code. The class FeatureExtractor
contains the method extractLocal
which apparently calls the respective extraction method for each activated feature. The comment here says: This function is called on every sub-Derivation,...
, but where does that happen? I found that extractLocal
is used by the ParserState
class and its subclasses within the method featurizeAndScoreDerivation
, but I don't see the recursion that takes the whole derivation tree into account. - The method computeScore
in the class Derivation
does recursively calculate all scores, but why is it used only the ReinforcementParser
? - I would like to gain some more insight here. Thanks!
Background
The method parse
in Parser
is responsible for parsing an utterance. It calls ParserState state = newParserState(params, ex, computeExpectedCounts)
and then state.infer()
to construct the Derivations. These two methods should be overridden in subclasses of Parser
. The state.infer()
call should populate the predDerivations
field of state
with the final Derivations for the example.
(Note: computeExpectedCounts
means whether to compute the information necessary for gradient updates. The flag is turned on during training and off during evaluation.)
Let's use BeamParser
as an example. Ignore the coarse{State,Prune}
stuff (and whenever BeamParserState.mode == bool
) which is only for speeding up parsing.
The newParserState
method constructs a new BeamParserState
object (which is in the same file BeamParser.java
). The infer
method of BeamParserState
constructs Derivations over utterance spans of increasing sizes using the build
method, and then call setPredDerivation
(defined in the parent class ChartParserState
) to populate the predDerivations
field.
Tracing the method calls from build
leads to the applyRule(start, end, rule, children)
method, which constructs Derivations from the given children using the given grammar rule. (Ignore the mode == bool
section.) The produced derivations newDeriv
are featurized and scored by featurizeAndScoreDerivation
(defined in ParserState
). This method calls extractLocal
to extract features and then computeScoreLocal
to compute the score.
Feature extraction and scoring
Feature extraction and scoring are actually not done recursively (sorry for my confusion earlier). Rather, it uses dynamic programming. Let's say a derivation A has one child B, A has local features {a1, a2}, and the total score of all features in B (including the features of its descendants) is already computed and stored in B.score. Then the total score of A, A.score, can be computed as A.score = params.getWeight(a1) + params.getWeight(a2) + B.score.
The extractLocal
method (in FeatureExtractor
) extracts features only for the current derivation and not the children. For example, the "rule" feature (in extractRuleFeatures
) will define only one feature for the rule used to build the derivation. The rules used for building the children should have already been extracted when the children were constructed (and featurizeAndScoreDerivation
was called on them).
The computeScoreLocal
method (in Derivation
) computes score
(total score of all features) = total score for local features + sum of the score
from children. Like the features, the score
field of the children should already have been populated when the children were constructed.
In ReinforcementParser
, there is no guarantee that the score of children derivations have been populated before the parent is constructed (I think), so the computeScore
is used instead of computeScoreLocal
to recursively score the children.
Note that this dynamic programming method only works for linear models (which SEMPRE uses). A model that consider interactions between features would not work.
Thank you! With your explanations and further studying of the code I now believe to have understood the following:
-
the field
chart
of the class (Beam)ChartParserState
keeps track of the so far detected derivations. It does so by maintaining a two dimensional map. The two indices stand forstart
andend
of a (sub)span. The Key is the name of a (target) category. The Value is a list of those derivations that have been found for that particular span with that particular category. -
The
infer
method ofBeamParserState
first finds the "primitive" derivations, that are based only on Tokens and Phrases. To achieve this,gatherTokenAndPhraseDerivations
is called for each subspan of the given span. The results are added tochart
. -
infer
then calls build on subspans of increasing length, respectively for all possible start indices within the span, from left to right. For each of these subspans, and in their thus determined order,build
callsapplyCatUnaryRules
andapplyNonCatUnaryRules
. These try to apply all available rules to the respective subspan. Everytime a rule is applicable to existing dervations inchart
(which is, in case ofapplyCatUnaryRules
, because itsrhs
matches the category these derivations are mapped to inchart
for the respective subspan),applyRule
is called. This method executes the rule that was found applicable, creates a new derivation and puts that one inchart
.
(Ok, forapplyNonCatUnaryRules
it is a bit more complex, but I think I got the principle how the most important methods interact...). So for each subspan build is called on, the derivations created on previous subspans can be used to potentially apply rules on. -
applyRule
also extracts the features of the newly added derivation. It then computes the "total" score of that derivation as the sum of its weighted local features and the scores of its children. -
finally,
infer
callssetPredDerivations
. This puts alls derivations that go over the whole original span and have category$Root
(and so qualify to be returned to the user) into the fieldpredDerivations
ofBeamParserState
.
Please correct me if I'm wrong.
Yes, these are all correct. Different parsers might construct logical forms in different orders (e.g., the second stage of FloatingParser does not care about start
and end
; the ReinforcementParser learns to pick which partial Derivation to expand on; etc.), but the general ideas about applying rules, scoring, and then collecting predDerivations
should be the same.
Thanks. Next thing will be to investigate the implementation of the math behind it, the actual calculation of parameters and probabilities.
So I'm trying to get to the core of the math behind the lerning process. I've brushed up my knowledge on Stochastic Gradient Descent, but have problems mapping it to the code. The main method here is apparently learn
in the Learner
class. It calls processExamples
, which uses parseExample
, checkGradient
and updateWeights
(which calls update
in class Params
).
The names suggest that the actual updating happens in these last two mentioned. But in the checkGradient
method there is the line
perturbedParams.getWeights().put(feature, perturbedParams.getWeight(feature) + eps);
which looks like the weights are updated here (but with the constant eps
and not with a calculated SGD step). Also, what part of the SGD algorithm do the fields objectiveValue
and expectedCounts
of the ParserState
class correspond to, that are used by checkGradient
? Could you please provide some hints about the ML implementation? Thanks
Objective function and gradient
- A derivation has score score(deriv) = sum_{feature f} params.weights[f] * deriv.feature_count[f]
- This is computed by
Derivation.computeScore(params)
- This is computed by
- Define p(deriv) = exp[score(deriv)] / sum_deriv' exp[score(deriv')]
- The objective function is the log-likelihood of the derivations with correct denotations
- The likelihood is sum p(deriv d with correct deno) = sum_{d with correct deno} exp[score(d)] / sum_{all derivs d} exp[score(d)]
- The log-likelihood is log(sum_{d with correct deno} exp[score(d)]) - log(sum_{all derivs d} exp[score(d)])
- The "expected count" of f is the derivative of the objective function with respect to params.weights[f]
- This involves a bit of algebra.
- To maximize the objective function, we add (expected count of f * learning rate) to the weight of f
How all this is done in the code
- Computing expected counts is done in
ParserState.computeExpectedCounts
- Now in
Learner.processExamples
:-
computeExpectedCounts
(boolean) indicates whether the weights should be updated - The update is done in batches: The
expectedCounts
in the ParserState are gathered in thecounts
map until a certain number of examples (specified bybatchSize
) is processed. ThenupdateWeights(counts)
is called on the accumulated expected counts (after whichcounts
is reset). -
updateWeights(counts)
essentially callsparams.update(counts)
. Ignoring L1-regularization, the function just doesMapUtils.incr(weights, f, stepSize * g);
. This means adding (expected count of f * learning rate) to the weight of f.
-
- You can ignore the
checkGradient
part. It is used to check if the gradient update is correct. It does not actually update the weights.
A question about how to tell a FeatureExtractor
which features it is supposed to use: I understand these features are specified in the command line using -FeatureExtractor.featureDomains......
. But where in the code are these actually added to the HashSet featureDomains
in the class Options
of the FeatureExtractor
?
Is it save to say, that for parsing relatively simple utterances into logical forms that are meant to be executed by JavaExecutor
mostly the BeamParser
Class and the Rule
Feature are relevant?
Using the BeamParser should be sufficient for the parser. For the features, the rule
features alone might be insufficient (since it only counts how many times each rule is used, regardless of the input utterance). The geo880 mode (for Geoquery dataset) lists the following features:
rule opCount constant whType span lemmaAndBinaries denotation lexAlign joinPos skipPos
(Line 1093 in the run
script). This might be a good set of features to try (though some of which are not guaranteed to work correctly based on the grammar and executor used).
Thanks for the hint. I was asking because I'd like to offer the user a reasonable selection of Features to choose from via checkboxes in my GUI. I'd also like to add a short description to of each Feature to the GUI.
The "constant" Feature seems to be located in the class ConstantFn
directly. Am I right that the code segment
res.addFeature("constant", ex.phraseString(c.getStart(), c.getEnd()) + " --- " + formula.toString());
registers appliance of ConstantFn
per subspan? Also, which part of the code does actually trigger that registration? It seems that there is no method in FeatureExtractor
taking care of the "constant" Feature.
Feature firing in SEMPRE is pretty complicated due to legacy reasons.
-
This docstring of
FeatureExtractor
gives an overview of the 3 ways features can be defined. - Feature domains can be turned on or off based on the flag
FeatureExtractor.featureDomains
. - Features defined in a
SemanticFn
(like the example you gave forConstantFn
) will be defined once theSemanticFn
is called (i.e., when building the Derivation).- In the case of
ConstantFn
, the feature is defined on every subspan the corresponding rule fires. For example,(rule $TypeNP (meeting) (ConstantFn en.meeting))
detects the word "meeting" and create a Derivation of category$TypeNP
with formulaen.meeting
. Theres.addFeature(...)
line you mentioned will register a "constant" feature during the creation of the Derivation (res
).
- In the case of
- Other features will be defined when
ParserState.featurizeAndScoreDerivation(deriv)
is called. TheFeatureExtractor
goes through all features defined locally in theFeatureExtractor
plus any registeredFeatureComputer
classes.
I'm back at investigating the features. whType
is using Part-of-Speech tagging.
As far as I managed to trace it, the method preprocess
in the Example
class populates the field posTags
inn the field languageInfo
by applying a LanguageAnalyzer
. I'd like to know how it is determined which POS tags are actually added here and if it is possible to change (add/remove) POS tags manually? thanks!
The types of POS tags depend on the LanguageAnalyzer
used (specified by the flag --LanguageAnalyzer.languageAnalyzer
). The available choices are SimpleAnalyzer
(which is a dummy analyzer that only recognizes numbers) and corenlp.CoreNLPAnalyzer
(which runs CoreNLP).
The POS tags returned by CoreNLP are the based on the model used, with the default using Penn Treebank tags. The class CoreNLPAnalyzer postprocesses some of the tags (e.g., marking auxiliary verbs), and more postprocessing could be hacked in there.
Thanks. I will think over if I need this feature. It's probably more relevant for database queries than for the cases the JavaExecutor
handles.
Shouldn't it be possible to map Strings to categories, e.g.
(rule $City (string "hamburg") (IdentityFn))
(rule $City (string "stuttgart") (IdentityFn))
(rule $City (string "munic") (IdentityFn))
(rule $ROOT ($City) (IdentityFn))
as a part/base of a more complex grammar?
In my application, using these rules, these city names aren't recognized. Is the grammar itself wrong? Otherwise it would mean, my application doesn't handle the grammar file correctly. Thanks
Could you try changing the rules to look like:
(rule $City (hamburg) (IdentityFn))
I had tried this; I then get:
ERROR: Composition failed: rule = $City -> hamburg (IdentityFn), children = []
java.lang.ArrayIndexOutOfBoundsException
Exception in thread "Thread-374" java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException
at edu.stanford.nlp.sempre.BeamParserState.applyRule(BeamParser.java:165)
at edu.stanford.nlp.sempre.BeamParserState.applyNonCatUnaryRules(BeamParser.java:224)
at edu.stanford.nlp.sempre.BeamParserState.applyNonCatUnaryRules(BeamParser.java:231)
at edu.stanford.nlp.sempre.BeamParserState.build(BeamParser.java:123)
at edu.stanford.nlp.sempre.BeamParserState.infer(BeamParser.java:98)
at edu.stanford.nlp.sempre.Parser.parse(Parser.java:170)....
(rest of the error message is referring to classes in my application)
What apparently works is:
(rule $City (hamburg) (ConstantFn (string hamburg)))
but that feels quite cumbersome.
The current grammar looks like this:
(rule $Dir (from) (ConstantFn (lambda x (call + (string "place of departure") (string :)(var x)))))
(rule $From ($Dir $City) (JoinFn forward))
(rule $Dir (to) (ConstantFn (lambda x (call + (string "arrival location") (string :)(var x)))))
(rule $To ($Dir $City) (JoinFn forward))
(rule $Part (by) (ConstantFn (lambda x (call + (string "transportation") (string :)(var x)))))
(rule $By ($Part $Transport) (JoinFn forward))
(rule $Transport (plane) (ConstantFn (string plane)))
(rule $Transport (train) (ConstantFn (string train)))
(rule $City (hamburg) (ConstantFn (string hamburg)))
(rule $City (stuttgart) (ConstantFn (string stuttgart)))
(rule $City (frankfurt) (ConstantFn (string frankfurt)))
(rule $City (munic) (ConstantFn (string munic)))
(rule $City (hannover) (ConstantFn (string hannover)))
(rule $ROOT ($From) (IdentityFn))
(rule $ROOT ($To) (IdentityFn))
(rule $ROOT ($By) (IdentityFn))
and manages to parse e.g. "from munic" into "(string "place of departure:munic")
- though it for some reason returns the same derivation tree twice here - , but I haven't
found a way to parse a combination of the categories $From, $To and $By.
Adding the rule
(rule $ROOT ($From $By) (ConcatFn ""))
and entering "from munic by train" returned (string nullnull). Is there a way? thanks!