AhoCorasick.Net
AhoCorasick.Net copied to clipboard
List to Array Failing in Benchmark Tests
Using the following code:
List<string> uniqueWords = new();
static string? parsedTermsComplete;
public void AttemptOne()
{
var wordArray = uniqueWords.ToArray();
var keyWords = new AhoCorasickTree(wordArray);
var keywordsPositions = keyWords.Search(parsedTermsComplete).ToList();
// var result = keyWords.Contains(parsedTermsComplete!); - alternative still fails.
}
I get the following error:
System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.ArgumentException: should contain keywords at AhoCorasick.Net.AhoCorasickTree..ctor(String[] keywords) at RegexTesting.Program.Attempts.AttemptOne() in C:\Demo\RegexTesting\RegexTesting\Program.cs:line 1058 at BenchmarkDotNet.Autogenerated.Runnable_0.WorkloadActionNoUnroll(Int64 invokeCount) in C:\Demo\RegexTesting\RegexTesting\bin\Release\net6.0\b224fdac-806c-4a65-a8ce-8efc0ea02b10\b224fdac-806c-4a65-a8ce-8efc0ea02b10.notcs:line 318 at BenchmarkDotNet.Engines.Engine.RunIteration(IterationData data) at BenchmarkDotNet.Engines.EngineFactory.Jit(Engine engine, Int32 jitIndex, Int32 invokeCount, Int32 unrollFactor) at BenchmarkDotNet.Engines.EngineFactory.CreateReadyToRun(EngineParameters engineParameters) at BenchmarkDotNet.Autogenerated.Runnable_0.Run(IHost host, String benchmarkName) in C:\Demo\RegexTesting\RegexTesting\bin\Release\net6.0\b224fdac-806c-4a65-a8ce-8efc0ea02b10\b224fdac-806c-4a65-a8ce-8efc0ea02b10.notcs:line 175 --- End of inner exception stack trace --- at System.RuntimeMethodHandle.InvokeMethod(Object target, Span`1& arguments, Signature sig, Boolean constructor, Boolean wrapExceptions) at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture) at System.Reflection.MethodBase.Invoke(Object obj, Object[] parameters) at BenchmarkDotNet.Autogenerated.UniqueProgramName.AfterAssemblyLoadingAttached(String[] args) in C:\Demo\RegexTesting\RegexTesting\bin\Release\net6.0\b224fdac-806c-4a65-a8ce-8efc0ea02b10\b224fdac-806c-4a65-a8ce-8efc0ea02b10.notcs:line 58
The following code works in the same benchmark suite - so it isn't that the list is empty....
public void AttemptTwo()
{
var wordArray = uniqueWords.ToArray();
int i = uniqueWords.Count - 1;
foreach (var item in wordArray)
{
var keyWords = new AhoCorasickTree(new[] { item });
if (keyWords.Contains(parsedTermsComplete))
{
uniqueWords.RemoveAt(i);
}
i--;
}
}
Not sure if I have misunderstood the implementation, but wanted to raise it as I thought that this would work.
For implementation, parsedTermsComplete should contain several copies of every single word in the uniqueWords List - as that is how the parsed terms were created - I was looking to find a way to QC that every single word in uniqueWords did in fact exist in the parsedTermsComplete.
Don't get me wrong, the implementation that works is super fast, 3 times faster than any other implementation of the test - just wondered why I can't use a List to Array.
Just as an addition - I took this out of the benchmark and put it into the actual project.
Worked just fine. Just fails in Benchmarks.
Hi @Kaotic3,
Thank you for the issue. It throws the exception because the constructor gets an empty array of keywords. See here https://github.com/alexandrnikitin/AhoCorasick.Net/blob/0fead5e8f842b4a330282a5fb1929136a426e7cf/src/AhoCorasick.Net/AhoCorasickTree.cs#L14 I assume uniqueWords hence wordArray is empty. How do you add items to the list?
Hi @alexandrnikitin thanks for the response.
These two methods appear exactly like this in the benchmark test:
[MemoryDiagnoser]
public class Attempts
{
[Benchmark]
public void AttemptOne()
{
var wordArray = uniqueWords.ToArray();
var keyWords = new AhoCorasickTree(wordArray);
var resultTest = keyWords.Contains(parsedTermsComplete!);
}
[Benchmark]
public void AttemptTwo()
{
var wordArray = uniqueWords.ToArray();
int i = uniqueWords.Count - 1;
foreach (var item in wordArray)
{
var keyWords = new AhoCorasickTree(new[] { item });
if (keyWords.Contains(parsedTermsComplete))
{
uniqueWords.RemoveAt(i);
}
i--;
}
}
}
I am not sure why it would work in AttemptTwo but not in AttemptOne if there was a problem with the list?
Also to add, as noted above - in the Program.cs main method, I placed AttemptOne in a class and then sent the results there instead of running the Benchmark -
var result = TermsParserChecks.FullContent(parsedTermsComplete, uniqueWords);
//var result = BenchmarkRunner.Run<Attempts>();
That runs fine and provides the expected result. I don't think the list is empty, but it does appear that it is being seen as empty when being run in a Benchmark?
Yes, it looks like the uniqueWords list is empty and not populated. Please populate it in the benchmark constructor or Setup method. See Benchmark.NET docs. It doesn't throw in the AttemptTwo method because it doesn't execute logic inside foreach block (because the list is empty).
Ok it seems that the second one will run while empty...and the benchmark is not taking the list with any items in it at all. The list definitely has items but doesn't have any in the benchmark - I will need to investigate that one.
Sorry @alexandrnikitin - didn't mean to waste your time!
Yes, it looks like the list is populated outside of the benchmark. No problem. Feel free to ask any question.