Church.jl
Church.jl copied to clipboard
Reify trace information
Great project!
It seems that you are currently storing some data about the current state (execution trace) in global variables samples
, dets
, getindexes
, and condition
. Have you considered using a more local trace
object instead that groups this information and anything else that may be part of the state?
This would make the design more modular, and would help with use cases such as copying/serializing the state (e.g., to resume sampling later on), and running multiple different models concurrently within a single file.
Thanks!
Interesting. Putting the global state in an object would be pretty
straightforward. However, I'd worry that trying to assign different
variables/functions to different State
objects would:
1 Make a mess of the syntax. 2 Provide lots of opportunities for bugs (what if you try to connect two variables that are part of different models?)
Moreover, serialization always going to be really hard. The problem is
that you don't just want the object graph, you want the variable
definitions too (e.g. a = normal()
), otherwise you can't actually access
any variables. Maybe an optional argument to give objects a name (i.e. x = normal(; name=:x)
, and a deserialize macro that looks for named objects,
and creates variables with those names.
However, it would be nice to get rid of samples
, dets
and
getindexes
. Their only purpose is to give a reference to every object
for the Church.jl gc step. Is some way to extract all defined objects from
the Julia gc? samplers
is a bit tougher to remove though - you can't
distribute samplers through the node graph easily, because there is no
one-to-one mapping between a sample and a sampler. Moreover, you need fast
access to every sampler - so an array of references seems like the obvious
data structure.
So maybe samplers
should be the global state. You could have as an
additional argument to a probability distribution a model
field, that,
left blank, defaults to some global value? Then serialize on samplers
would pull in the whole model and hence work as expected (though you would
need the definitions). Furthermore, this would allow you to compare
different samplers for the same model, without having to actually duplicate
the variables.
On Tue, Jan 28, 2014 at 6:53 PM, Andreas Stuhlmüller < [email protected]> wrote:
Great project!
It seems that you are currently storing some data about the current state (execution trace) in global variables samples, dets, getindexes, and condition. Have you considered using a more local trace object instead that groups this information and anything else that may be part of the state?
This would make the design more modular, and would help with use cases such as copying/serializing the state (e.g., to resume sampling later on), and running multiple different models concurrently within a single file.
Reply to this email directly or view it on GitHubhttps://github.com/LaurenceA/Church.jl/issues/2 .
I'm not sure the syntax has to change for a simple version of the proposal. You could still have a global pointer to the current state (including samplers), except that this information is now grouped under a new Trace
type. In contrast to the current setup, this pointer could be non-constant, so that it can be set to a user-provided trace container. This way, it is possible to continue using your library as it is now (with a single default state), but the user has more control over the state object and can copy it, switch it out, etc. if necessary (at some risk of introducing bugs, as you point out).
Then serialize on
samplers
would pull in the whole model and hence work as expected (though you would need the definitions). Furthermore, this would allow you to compare different samplers for the same model, without having to actually duplicate the variables.
I don't quite understand the idea behind this yet, but this sounds great!
(On a related note, it would be very useful if the code came with a reference that explains for each of the technical terms—sample
, sampler
, model
, det
, etc.—how it is used in the context of this project.)
Where should the reference information go? Comments inline? In a REFERENCE file?
On Wed, Jan 29, 2014 at 5:28 AM, Andreas Stuhlmüller < [email protected]> wrote:
I'm not sure the syntax has to change for a simple version of the proposal. You could still have a global pointer to the current state (including samplers), except that this information is now grouped under a new Trace type. In contrast to the current setup, this pointer could be non-constant, so that it can be set to a user-provided trace container. This way, it is possible to continue using your library as it is now (with a single default state), but the user has more control over the state object and can copy it, switch it out, etc. if necessary (at some risk of introducing bugs, as you point out).
Then serialize on samplers would pull in the whole model and hence work as expected (though you would need the definitions). Furthermore, this would allow you to compare different samplers for the same model, without having to actually duplicate the variables.
I don't quite understand the idea behind this yet, but this sounds great!
(On a related note, it would be very useful if the code came with a reference that explains for each of the technical terms--sample, sampler, model, det, etc.--how it is used in the context of this project.)
Reply to this email directly or view it on GitHubhttps://github.com/LaurenceA/Church.jl/issues/2#issuecomment-33557747 .
Any of these would be fine. I'd probably put it inline if it is just a sentence, and in doc/reference.md if it is more.