nengo-1.4 icon indicating copy to clipboard operation
nengo-1.4 copied to clipboard

Concern regarding thread syncrhonization and question regarding protocol for raising these concerns

Open JeffKramer opened this issue 12 years ago • 5 comments

Sorry, I'm new to Git and I submitted the issue before entering the content.

I didn't find an FAQ on the appropriate way to raise these concerns, and the dev mailing list looked inactive, so I'm using Git. Please let me know if there is a better way to do this.

I grabbed a copy of the nengo source a little while ago and started to review the simulation package.

I'm concerned that NodeThread.runProjections doesn't properly synchronize the projection.origin and projection.termination values before assigning them from the origin to the termination. From my quick review of the source, it looks like a projection origin can be set on one thread and retrieved via NodeThread.runProjections on another thread. Given that is correct, then from what little I understand of the Java Memory Model, something needs to be done to synchronize access to the referenced objects so that the termination gets the correct and current value from the origin.

For example, BasicOrigin could make myValues volatile and the Origin interface documented so that developers of implementing classes provide similar guarantees.

Concurrency In Practice by Brian Goetz and Tim Peierls explains it far better than me. I think Chapters 2 and 16 would be the best place to start.

JeffKramer avatar Dec 11 '12 00:12 JeffKramer

If I'm understanding you correctly, I think we do have such a synchronization measure in place. If you look at the function NodeThreadPool.step, we always make sure that all the nodes in the network have finished stepping before beginning stepping any of the projections. This ensure that the origins always have the correct values in them before we move the values along the projections from the origins to the terminations.

Admittedly there's probably a more correct way to do this, and I'm definitely open to suggestions.

e2crawfo avatar Dec 13 '12 18:12 e2crawfo

Hi Eric:

Thanks for your reply.

First, was that the correct forum to raise the issue?

Second, you do make sure that the nodes have finished stepping, but Java doesn't guarantee that an assignment to a variable one one thread is made available to another thread unless thread synchronization is, for lack of a better term, requested.

I think one of the best examples, because it is counter intuitive, is from Concurrency in Practice, Chapter 3, I hope they would consider this fair use.

public class NoVisibility { private static boolean ready; private static int number;

private static class ReaderThread extends Thread { 
    public void run() { 
        while (!ready) 
            Thread.yield(); 
        System.out.println(number); 
    } 
} 

public static void main(String[] args) { 
    new ReaderThread().start(); 
    number = 42; 
    ready = true; 
} 

}

There is a chance that the above thread may never stop or that the output would be zero. It depends on the JVM, processor, OS, optimization and other things, better explained in the book.

In the above case, making the variables volatile, as in:

private volatile static boolean ready; private volatile static int number;

instructs the Java compiler to generate code that causes appropriate actions (e.g. the cache on on CPU core to be flushed to memory before an updated value can be read) to be taken so that memory access across threads is intuitive and if one thread sets a value 'before' another thread reads it, it will be guaranteed that the read thread will see that value.

I didn't see any use of volatile, synchronized methods, or the higher level constructs like AtomicReference in the methods that implemented the Origin interface. So I suspect that such measures were not taken. If I'm write, the fact that your code makes sure it sets the projection origin before it is assigned to the projections terminator doesn't force Java to make sure that the current value is used (if the origin was set on another thread, the memory location (or cache) used by the current thread might not have been updated, and in rare cases, may never be updated).

I have some suggestions on how to deal with it, but I think it best to first confirm that I've read the code correctly and that the above makes sense (and if you were already aware of the above example, I hope I haven't come across as condescending) to you. If you aren't familiar with the above concepts, get a copy of that book. It was written by some of the guys that worked on the concurrency libraries for Java and they provide information on how the Java memory model works with respect to threading with recommendations on the tools to make multi-threaded code safer and easier to write.

Jeff

On 2012-12-13, at 1:18 PM, Eric Crawford wrote:

If I'm understanding you correctly, I think we do have such a synchronization measure in place. If you look at the function NodeThreadPool.step, we always make sure that all the nodes in the network have finished stepping before beginning stepping any of the projections. This ensure that the origins always have the correct values in them before we move the values along the projections from the origins to the terminations.

Admittedly there's probably a more correct way to do this, and I'm definitely open to suggestions.

— Reply to this email directly or view it on GitHub.

JeffKramer avatar Dec 14 '12 00:12 JeffKramer

Hi Eric:

I just reviewed BasicOrigin to see what could be done.

I focused on the myValues member as it was the first item I noticed that appeared to be set on one thread and accessed from another.

There will be less chances of synchronization issues if the classes that implement InstantaneousOutput are immutable. I reviewed them and they all appear to be immutable, which is good. I'd document the interface to indicate that all implementations live up to that contract. I'd also use "final" modifies on the member variables of the implementing classes so that the compiler/IDE will be able to enforce it.

Since myValues is set and accessed from different threads, it should be made volatile.

The next problem is in BasicOrigin.setValues(float, float, float[]). It accesses various non volatile members when creating the new value for myValues. myNoise and myNoises are both instances of Noise. I looked at some of the classes that implement Noise and at least one of them is not immutable, e.g. NoiseImplPDF. If their members are assigned on different threads than the setValues call, there could be synchronization issues. I didn't review all of the references to myValue(s) so I don't know if there is a problem.

That's where I stopped reviewing BasicOrigin.

You'd need to perform such an inventory across the code base to determine which members are shared across threads and then make them thread safe. Where possible, it is best to make them immutable, using the final modifier gives the compiler hints so that those members can be accessed across threads without concern. If they can't be made immutable then you need to use the various java keywords and libraries to properly synchronize access (e.g. volatile, AtomicReference, synchronized access to methods); the exact solution depends on each member and how it is used it can be a complex task.

The problem with the above is that hairiness of properly sharing mutable data across threads is exposed to your model developers - I'm guessing that the intent is for various modellers to create their own implementation of Node, Noise, Origin, etc.. With the current architecture, each of those implementations will need to be made thread safe. It's going to be difficult as you can't enforce any rules on the interface, you have to rely upon developers to be skilled and diligent enough to implement the interface correctly.

One way to reduce the design overhead, with respect to thread safety, of an application like this is to make the model objects thread local and have them pass shared information via messages on queues. This separates concerns as model developers can focus on their model while thread safety and the issues related to mult-thread access to shared data is handled elsewhere. My guess is that a message passing design is actually similar to what you are trying to model - various nodes sending messages to other nodes over projections (BUT, I know so little of your model that my guess might be way off). There are various frameworks to support such a design - see Akka, Actor - although you can also implement it on your own in Java. With such a framework, your Nodes will get their new inputs in a thread safe manner, can manage their state on their own thread without concerns for thread safety, and share new data with other Nodes in a thread safe way. The developers of the Node (or Origin, etc.) implementations don't have to think about threading issues.

As an aside, the Akka library is also supposed to support distributing your computation across multiple instances of the application on multiple cpus. If you are planning on scaling out the app and haven't figured out how to scale, you may want to look at Akka. Note that I have only looked at Akka, I haven't done anything more involved than "Hello World", so I can't vouch for it.

I may have opened a can of worms as my last two paragraphs raise the issue of changing the architecture for a project I know nothing about. I suspect it might not be appropriate for this project , but I think that would be the best way to remove most thread safety concerns.

Jeff

On 2012-12-13, at 1:18 PM, Eric Crawford wrote:

If I'm understanding you correctly, I think we do have such a synchronization measure in place. If you look at the function NodeThreadPool.step, we always make sure that all the nodes in the network have finished stepping before beginning stepping any of the projections. This ensure that the origins always have the correct values in them before we move the values along the projections from the origins to the terminations.

Admittedly there's probably a more correct way to do this, and I'm definitely open to suggestions.

— Reply to this email directly or view it on GitHub.

JeffKramer avatar Dec 16 '12 16:12 JeffKramer

Hi Jeff,

This is all super helpful, the multithreading has never worked as well as we'd like it to and these are good places to start. Our concerns are typically that its not as fast as expected, but it'd be nice to nail down the correctness too.

I like the idea of using the message passing framework. As you guessed, I think we really would like to avoid all the overhead of making sure all the models conform. We actually do most of our modelling by subclassing the Java classes in Jython, which complicates the issue even more, and the message passing idea seems like the right way to do it. Tomorrow morning I'll have a look at Akka and if its whats we need.

By the way, you have it right that at an abstract level our models are just nodes passing messages to other nodes, though its "implemented" by neurons sending action potentials to other neurons. :)

Thanks for pointing all this out, I'll let you know how it goes.

Eric

e2crawfo avatar Dec 17 '12 03:12 e2crawfo

I suspect it will get a little bit slower with thread synchronization, but if this simulator can be run in a distributed message passing app you might get around that by throwing more commodity hardware at the problem :).

This project is interesting, let me know if I can help out.

I'd be especially interested in getting involved if you guys decide message passing is the way to go.

On 2012-12-16, at 10:41 PM, Eric Crawford wrote:

Hi Jeff,

This is all super helpful, the multithreading has never worked as well as we'd like it to and these are good places to start. Our concerns are typically that its not as fast as expected, but it'd be nice to nail down the correctness too.

I like the idea of using the message passing framework. As you guessed, I think we really would like to avoid all the overhead of making sure all the models conform. We actually do most of our modelling by subclassing the Java classes in Jython, which complicates the issue even more, and the message passing idea seems like the right way to do it. Tomorrow morning I'll have a look at Akka and if its whats we need.

By the way, you have it right that at an abstract level our models are just nodes passing messages to other nodes, though its "implemented" by neurons sending action potentials to other neurons. :)

Thanks for pointing all this out, I'll let you know how it goes.

Eric

— Reply to this email directly or view it on GitHub.

JeffKramer avatar Dec 17 '12 23:12 JeffKramer