gauge-dotnet
gauge-dotnet copied to clipboard
ScenarioDataStore broken in version 0.6.0 - race condition.
@sriv When adding values to data store and then retrieving the value Gauge returns null intermittently. Seems to be race condition.
Related to previous PR to handle async methods: https://github.com/getgauge/gauge-dotnet/issues/199
Example to recreate problem can be found here: https://github.com/jensakejohansson/gauge-async
When I execute the scenario it seems to fail in about 2/3 of executions. It's just 2 steps. One that adds a value to the store and the next step tries to retrieve the stored value again.
thanks for reporting, I will investigate. the async change was big and I was surprised that nothing broke. The randomness explains why the tests didnt catch it.
I think you should consider rolling back the release 0.6.0 if possible. People who runs test executions in a pipeline might just install the latest plugin version by default and then suddenly their tests starts failing due to a not so clear reason. I just got a mail about that... 😬
Done, 0.6.0 is marked as a pre-release and the release metadata has been reverted from gauge-repository.
Apologies for the mess.
I did some investigation. Thanks to your sample project I could replicate the issue. I believe it is because the datastores use a ThreadLocal<ConcurrentDictionary>
to hold the data, and threadlocal+async do not go well together.
I will try changing the Gauge.CSharp.Lib project to use something else instead of a ThreadLocal<ConcurrentDictionary>
and check.
I did some analysis - and my suspicion was right. Gauge.CSharp.Lib provides the types for DataStores which internally uses a ThreadLocal<ConcurrentDictionary>
The context here is that the initial releases of gauge supported parallel execution by bringing up as many runner processes as the number of streams. This had some limitation in terms of the footprint used and more importantly not very container friendly.
Subsequently, gauge supported multithreaded parallel execution for languages that have good multi threading support. However there was a bit of backward compatibility to be honoured. In a multi process execution, each stream had its own datastore instantiated. In a multi threaded environment, there was one datastore shared across streams causing contention and race conditions. As a result, to mimic the multi process datastore setup, it was decided to make the datastores threadlocal.
In the async world, the threadlocal does not help because there is no guarantee that tasks will run on the same thread because of the design of async. The reasonable replacement for this would be AsyncLocal
which ensures isolation between the async runcontexts. But this may not help here too much.
So the choice is - break backward compatibility and have single datastores per process or not have the async support.
I am investigating if there is a way to do both, but I am not very hopeful given the fundamental difference between async paradigm and a sticky thread execution.
I see 2 options for this:
- Introduce a separate datastore API for Async
- Ditch the threadlocal and let the runtime control the async read/writes.
Happy to hear suggestions.
/cc @getgauge/core (if anyone has any ideas on this)
Ouch! I see the problem.
Is it a correct understanding that option 2 above would solve the async problem and parallelization could be still be done, but less efficient by spawning processes instead of threads - potentially breaking some sort of backward compatibility in Gauge?
If so, then to me personally that would be the best option since async support (and a clean/simple to use API) is more important to us than efficient parallelization. We primarily run end-2-end user interface based test scenarios in Gauge, seldom in parallel at all, and if so, only at a very limited scale where some extra overhead would be totally acceptable. The lack of async support however is a major thing since many 3rd-party libraries/tools expose async APIs.
However, this is our specific use case, who am I to say what is best for all users?...
I would agree with above. I think option 2, having async work would be beneficial.
@sriv I have to come back to this issue again. Would you have the possibility to implement a solution for this so async is supported, even if it means compromises and it's not an 'optimal' solution? We really need a way forward here and any help would be much appreciated!
Best regards,
apologies @jensakejohansson - work has kept me very busy. I will try to see if I can spend some time over the weekend. Appreciate your patience.
Hello @sriv - I came across this issue and I noticed that there have been no updates since Jul 2. Do you know if it is still in the plans to implement option#2?
@sriv Sorry for nagging, but I guess you haven't have the possibility to address this?
It's hard for me to argue for using Gauge in new projects if it cannot handle async libraries. We'd like provide a solution ourselves, but I think it will be difficult due to our limited knowledge of all the technical details and history / potential backward compatibilities etc. related to this.
However, if you feel you are unable to address this I guess we will give it try and do our best (when we find time). If so, any guidance of what changes needs to be done and other pieces of advice are welcome.
Best regards,
@sriv I looked over the PR you added to the library, and I have some concerns about the solution proposed. While it might "work", I think it would have a very negative impact on running gauge multi-threaded. I am not sure if this is a problem, maybe most people do not run Gauge multi-threaded.
I have been working on a different solution to the async/await problem which would work similar to how the .NET gauge solution works now, but is not backwards compatible between the library and plugin (meaning you would have to upgrade both at the same time); however, it would just work without any new settings or changes to client's tests.
@mpekurny There were issues reported with multithreading and I had to rollback the release. So no surprises there :).
I think I am ok to let go of backward compatibility in this particular case for mainly the reason that no one has spoken strongly about it in this scenario and I would imagine that the refactoring to accomodate the new changes would not be drastic.
Would you be open to raise a PR for the components?
@jensakejohansson - I was thinking about a way to handle the backward and raised a PR for Lib. But I am interested in @mpekurny's idea.
@sriv @jensakejohansson @chadlwilson
Would you be open to raise a PR for the components?
Yes, when it's ready. With other work commitments, I am probably another week or so before I would be ready to raise PRs for the changes (it would require changes in both the plugin and library). If you could hold off merging in your changes until then?
@sriv @jensakejohansson
I am doing a little bit more testing but I am just about ready to push up the code and create PRs. I expect to have PRs sometime Monday. Thanks for your patience.
Marty
@mpekurny Great! I'm looking forward to it! 👍