[WIP]: Add Type decompilation services
Adds assembly decompilation to get an object representing the package or assembly in question. Uses Mono.Cecil to read the IL code, and ILSpy's decompiler to convert IL code in to C# code -- all in all avoiding loading of the assemblies into the host process.
Decompilation
- Types (names of types)
- Methods, Constructors (return type, name)
- Properties, Fields, Events (return type, name)
- C# declaration of above elements, e.g.:
public void MyMethod(string somevalue);
Is by no means complete, but more information can be extracted as needed.
Source code
I've created a little interface to post-populate the decompiled assembly with source code. The idea is to have multiple ways of fetching the sources, in some preferred order, for example:
- Embedded sources in pdb (possible in portable pdb's?
EmbedAllSources?) - Embedded sources in Nupkg archive (
src/..) - SourceLink sources from the internet (read sourcelink json from pdb, somehow link types to C# files, fetch those files ..)
- Decompiled sources from IL code <-- I've made this one
Usage, right now
I've not created any code to link it to BaGet yet. But right now, you could run the decompiler like such:
using (var reader = new PackageArchiveReader(@".\TMDbLib.1.3.2-alpha.symbols.nupkg"))
{
var res = new NugetDecompilerService(new AssemblyDecompilerService()).AnalyzePackage(reader);
// use res
}
Part of #140
So. In order to obtain signatures for members and types (like public class SomeClass), I upgraded the CSharp Decompiler project to the new prerelease, version 4.x.
Doing so cut out Cecil for decompilation, and boom, major speedup. For practically the same code, NEST took 5+ minutes on the old versions, while it now takes a few seconds. So. That's cool :)
So, @loic-sharma, I've got the decompilation in place. There's a lot to do with regards to pulling out documentation and whatnot, but the core is in place.
How would a type-searching-source-code-something service look like?
Following ISearchService, I imagine:
Task IndexAsync(PackageId id, AnalysisNugetAssembly code)-- indexes details for a specific PackageTask<AnalysisNugetAssembly> GetAsync(PackageId id)-- fetches a single packages code by idTask<AnalysisType> GetAsync(PackageId id, NugetFramework targetFramework, string type)-- fetches a single packages code, for a single type, by id+framework+typeTask<IReadOnlyList<CodeSearchResult>> FindAsync(string query, List<NugetFramework> allowedFrameworks = null)-- finds "something", be it entire assemblies or types/members.
And CodeSearchResult would be:
PackageIdTargetFrameworkResultType--Type,Member
Perhaps do this as inherited types?
TypeSearchResult : CodeSearchResultType-- e.g.MyBaseClassDisplay-- Some C# string, likepublic abstract class MyBaseClass<T> : IDisposable where T : IOtherInterface
MemberSearchResult : CodeSearchResultType-- e.g.MyBaseClassMemberType--Constructor,Method,Property,Field,EventDisplay-- Some C# string, likepublic void DoWork();
@loic-sharma I was wondering. I'm going through the binaries in the nupkg using ReferencedItems(). What's the correct way? ... do we risk getting native binaries this way?
I've added a DB integration, and merged more code. There is a slight flaw in the dependencies, seems a lot of stuff (implementations+interfaces) is in the Core project... So to not mess up too much, I've merged in my Decompiler project with the Core project.
Saved data to the Sqlite DB:

Todos for DB:
- Figure out relations. SQLite does not support foreign keys (the EF driver doesn't), for now, I've simply removed the AddForeignKey statements in the migration. These'll probably have to be redone.
- Create migration for MSSQL
- It seems there's a change in the model.
PackageKey(fake property) onPackageDependencyhas changed from anint?toint.
This is looking great! Some comments:
- The database migration is a little scary - once we add that in, we can't go back without breaking users that have migrated. Could we get merge the decompilation piece separately from the storage pieces?
- Let's keep the decompilation separate from
BaGet.Core. Can we keep this in a separate project namedBaGet.Decompilationor something? I'm not sure how Entity Framework's models will fit in, but we can figure this out later!
I'd like to prototype running Roslyn analyzers on decompiled sources to find bugs in NuGet packages (incorrect async usage, etc...). This would be a great addition to NuGet Package Explorer's package analysis feature, so we should aim to make this as reusable as possible. Thanks for the fantastic work! :)
As I mentioned, I tried the Baget.Decompilation route, but quickly realized this thing with the entities. So to not duplicate all those (one for DB stuff and one for Decompilation), I chose this route.
However. It may be that there should be a DB entity and a not-db-entity. So that the DB entity can include relationships needed for the DB, while the Decompilation entity can include only decompilation-related stuff. Then it's a non-issue to move out decompilation (again :)); at the cost of mapping between these two entity types.
How'd that sound?
Also. There is a slight space issue. Having uploaded one NEST assembly (3.400 public types), the Sqlite DB grew to 24 MB... Soo.. that's a lot of space within a short timespan.
The mapping between decompilation objects and database entities sounds good to me. I’m wondering if maybe we should only store type and method information in the database? Would that reduce storage costs if we cut out the C#? We’d have to regenerate the C# code everytime it is requested.
I’ll be out for the holidays with no internet, so I won’t be able to review until the end of next week.
Decompiling on the fly is a no go... The packages would have to be downloaded for each view.. A few options:
- Compressing the C# code in the DB (it's not for searching anyways)
- Decompiling the C# later on (decompile all types in one go, when any type is wanted, store the results in DB)
- Store the C# elsewhere (compressed json f.ex., perhaps on the same storage as packages, alternatively on a separate storage)
Note for myself, look into https://github.com/KirillOsenkov/SourceBrowser
That could be cool - dump the members for searching purposes, then use this for display.
That's what I was thinking too! Could we IM on gitter about this feature? If so, when would be a good time for you? I wanna make sure we're on the same page 😄