lucenenet icon indicating copy to clipboard operation
lucenenet copied to clipboard

Investigate how we can load external assemblies when running Lucene.Net.Benchmarks on the command line

Open NightOwl888 opened this issue 5 years ago • 6 comments
trafficstars

The benchmarks project was designed to be able to load user-defined projects to run. In Java, this could be done with a single string to identify the types to load, however, .NET requires a reference to the actual assembly in order to read the types from it.

We currently have it set up to read all types from all assemblies that are referenced, but this causes the Lucene.Net.Tests.Benchmark.ByTask.Tasks.Alt::TestWithoutAlt() test to fail because in Java the types were supposed to be loaded on demand. So, we need to investigate the best way to load types from external assemblies in .NET to run benchmarks on from the lucene-cli tool.

The part that has been altered to allow assemblies to be "automatically" discovered is:

// Loads all assemblies in current referenced project (this was not in the original Lucene source)
IEnumerable<string> referencedAssemblies = AssemblyUtils.GetReferencedAssemblies().Select(a => a.GetName().Name);
result.Add(dfltPkg);

if (alts == null)
{
	result.UnionWith(referencedAssemblies);
	return result.ToArray();
}

foreach (string alt in alts.Split(',').TrimEnd())
{
	result.Add(alt);
}
result.UnionWith(referencedAssemblies);

Equivalent in Lucene 4.8.1

The problem is that when running as a separate process (lucene-cli), the end user has no way to reference assemblies, and therefore cannot change what is loaded by the tool.

I am no expert on Java, but from what I gather there is a convention-based and extensible "class path" that can be interacted with by end users regardless of whether it is inside or outside of the .jar package. I think the way it works is that by simply dropping an external .class file (similar to a .NET Type) in the same directory as an internal class, the JVM will load it, but it is also possible to inject a custom "class loader" to load from alternate locations or to add additional class paths on the command line.

It is also possible in Java to either reference a .jar file like a DLL or execute it like an EXE. For example, if a main() method exists in any class, it can be executed directly on the .jar file like:

java -cp lucene-core.jar org.apache.lucene.index.IndexUpgrader [-delete-prior-commits] [-verbose] indexDir

or 

java -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex pathToIndex [-fix] [-verbose] [-segment X] [-segment Y]

Since DLLs cannot be executed directly in .NET in the same way, we have a gap between the two platforms. The lucene-cli tool was created as a wrapper process to execute any one of the main() methods that were included in Lucene to fill this gap. This wrapper executable has revealed yet another gap, since the end user should be able to supply their own classes to benchmark and currently there is no way to do so.

So, the essence of this task is to do the following:

  • Create a similar convention-based and/or command-line based way for end users to be able to run any of the benchmark commands against their own assemblies/code
  • Prefer to utilize the .NET platform's nearest counterpart and convention rather than invent a custom one, where possible
  • Utilize the Lucene.Net.Benchmarks string-based configuration for the user to be able to specify which type to load

IMO, we don't necessarily have to have as many options as were available in Java, we simply need to provide an option, which we are currently lacking.

NightOwl888 avatar Jul 07 '20 09:07 NightOwl888

Loading types in from non-referenced assemblies are fairly simple in .NET, if they reside in a different location (folder wise) one has to implement some assembly resolution handling, but I have done that many times in the past.

In only really gets complicated if we begin to talk about having the loaded assemblies isolated. This is often done to allow for loading and then unloading them again. However since this is a command line then that sounds irrelevant.

But I think the Issue lacks more context, this could be some examples or references to documentation of how the Java version works. As well as how do we envision this should work?

jeme avatar Oct 06 '20 10:10 jeme

For netcore this is really easy and you can unload them. In .NET Framework this is more difficult and you cannot unload them unless you create and destroy custom AppDomains at runtime which is possible but you cannot flow data between the domains unless you are using string serialization or remoting (all ugly). In netcore you just use AssemblyLoadContext, there are samples here https://github.com/dotnet/samples/tree/master/core/tutorials/Unloading (https://github.com/dotnet/samples/blob/master/core/tutorials/Unloading/Host/Program.cs)

But the way benchmarkdotnet works is underneath for the execution it dynamically creates a netcore project and compiles it with the references that you are telling it to, it then runs the benchmarks against the compiled .exe output. So by using benchmarkdotnet you are sort of already loading in external assemblies. It's been a while since I looked but you can control how benchmarkdotnet builds it's program.

Shazwazza avatar Oct 06 '20 11:10 Shazwazza

@jeme

Thanks, you were correct in that the original issue was lacking some context, and I have added more information to better explain the task.

The Lucene.Net.Benchmark project was designed to work either as a library that users can extend or as an executable that can just be run. The issue crops up in the latter case where we need some sort of a "plug in" architecture so the end user can supply their own assembly to run it against. Java has a native feature to do this, but .NET does not.

@Shazwazza

Although I think that we should look into leveraging BenchmarkDotNet for Lucene.Net.Benchmark at some point, the current incarnation is just a line-by-line port from Java. The Lucene.Net.Benchmark project uses a DSL to control the configuration of a benchmark, including strings that are meant for loading external types.

Since the commands are essentially run-once I don't believe there will be any issues with "unloading" to worry about.

NightOwl888 avatar Oct 06 '20 14:10 NightOwl888

oh yes my bad, i was confused with the benchmarkdotnet project(s) that we have, this one is different.

Since the commands are essentially run-once I don't believe there will be any issues with "unloading" to worry about.

If its for netcore 3 then AssemblyLoadContext is still the way to do it whether you unload or not. This is a nice post about it https://codetherapist.com/blog/netcore3-plugin-system/ If it's not netcore 3 then you can use Assembly.Load(name) if you want it loaded correctly (with fusion) but then the assembly needs to be in your probing paths (i.e. /bin), else you can load with Assembly.LoadFrom(filename) or Assembly.Load(bytes) but if you do that, the assembly will not be loaded in the same context. This is all different depending on the platform you are running on. In Net Framework this is all super ugly and you need to know about the 3 load contexts: Default, Load-From, No Context, see https://docs.microsoft.com/en-us/dotnet/framework/deployment/best-practices-for-assembly-loading but basically dealing with anything but the Default is a pain and you will almost always need an AppDomain.AssemblyResolve event, but you might get success with LoadFrom

netcore has fixed all this nonsense :) so depends on what it needs to run on

Shazwazza avatar Oct 06 '20 14:10 Shazwazza

Actually, come to think of it, that brings up another potential gap that wasn't previously considered. The lucene-cli tool is targeted at .NET Core 3.1 only. This may be an issue if the end user needs to load .NET Framework assemblies into its context in order to benchmark the types within them.

Potential solutions/workarounds:

  • Don't support .NET Framework in the CLI, require .NET Framework users to compile their DLL as .NET Standard in order to benchmark in .NET Core 3.1 or use the DLL and build their own wrapper CLI for .NET Framework
  • Create a separate version of the tool for .NET Framework (possibly even move the benchmark commands to a separate tool)

I know that in early versions of .NET Core, it was possible to load .NET Framework assemblies with certain conditions/limitation, which could also potentially be explored.

NightOwl888 avatar Oct 06 '20 15:10 NightOwl888

Don't support .NET Framework in the CLI,

That would be my vote, I just don't see it worth spending a whole lot of time for .NET Framework compatibility. If the main project supports it then I think that's enough IMO.

I know that in early versions of .NET Core, it was possible to load .NET Framework assemblies with certain conditions/limitation, which could also potentially be explored.

Yep we were exploiting that in our own builds and it sort of still works in netcore 3, however in netcore 3 official support for it has been entirely dropped. Like if you drop a dll into the /bin it will 'work' but i think it really depends on what's in the DLL.

Shazwazza avatar Oct 07 '20 00:10 Shazwazza