BaGet icon indicating copy to clipboard operation
BaGet copied to clipboard

[WIP]: Add Type decompilation services

Open LordMike opened this issue 7 years ago • 12 comments

Adds assembly decompilation to get an object representing the package or assembly in question. Uses Mono.Cecil to read the IL code, and ILSpy's decompiler to convert IL code in to C# code -- all in all avoiding loading of the assemblies into the host process.

Decompilation

  • Types (names of types)
  • Methods, Constructors (return type, name)
  • Properties, Fields, Events (return type, name)
  • C# declaration of above elements, e.g.: public void MyMethod(string somevalue);

Is by no means complete, but more information can be extracted as needed.

Source code

I've created a little interface to post-populate the decompiled assembly with source code. The idea is to have multiple ways of fetching the sources, in some preferred order, for example:

  • Embedded sources in pdb (possible in portable pdb's? EmbedAllSources ?)
  • Embedded sources in Nupkg archive (src/..)
  • SourceLink sources from the internet (read sourcelink json from pdb, somehow link types to C# files, fetch those files ..)
  • Decompiled sources from IL code <-- I've made this one

Usage, right now

I've not created any code to link it to BaGet yet. But right now, you could run the decompiler like such:

using (var reader = new PackageArchiveReader(@".\TMDbLib.1.3.2-alpha.symbols.nupkg"))
{
    var res = new NugetDecompilerService(new AssemblyDecompilerService()).AnalyzePackage(reader);

    // use res
}

Part of #140

LordMike avatar Dec 05 '18 23:12 LordMike

So. In order to obtain signatures for members and types (like public class SomeClass), I upgraded the CSharp Decompiler project to the new prerelease, version 4.x.

Doing so cut out Cecil for decompilation, and boom, major speedup. For practically the same code, NEST took 5+ minutes on the old versions, while it now takes a few seconds. So. That's cool :)

LordMike avatar Dec 15 '18 19:12 LordMike

So, @loic-sharma, I've got the decompilation in place. There's a lot to do with regards to pulling out documentation and whatnot, but the core is in place.

How would a type-searching-source-code-something service look like?

Following ISearchService, I imagine:

  • Task IndexAsync(PackageId id, AnalysisNugetAssembly code) -- indexes details for a specific Package
  • Task<AnalysisNugetAssembly> GetAsync(PackageId id) -- fetches a single packages code by id
  • Task<AnalysisType> GetAsync(PackageId id, NugetFramework targetFramework, string type) -- fetches a single packages code, for a single type, by id+framework+type
  • Task<IReadOnlyList<CodeSearchResult>> FindAsync(string query, List<NugetFramework> allowedFrameworks = null) -- finds "something", be it entire assemblies or types/members.

And CodeSearchResult would be:

  • PackageId
  • TargetFramework
  • ResultType -- Type, Member

Perhaps do this as inherited types?

  • TypeSearchResult : CodeSearchResult
    • Type -- e.g. MyBaseClass
    • Display -- Some C# string, like public abstract class MyBaseClass<T> : IDisposable where T : IOtherInterface
  • MemberSearchResult : CodeSearchResult
    • Type -- e.g. MyBaseClass
    • MemberType -- Constructor, Method, Property, Field, Event
    • Display -- Some C# string, like public void DoWork();

LordMike avatar Dec 16 '18 15:12 LordMike

@loic-sharma I was wondering. I'm going through the binaries in the nupkg using ReferencedItems(). What's the correct way? ... do we risk getting native binaries this way?

LordMike avatar Dec 17 '18 17:12 LordMike

I've added a DB integration, and merged more code. There is a slight flaw in the dependencies, seems a lot of stuff (implementations+interfaces) is in the Core project... So to not mess up too much, I've merged in my Decompiler project with the Core project.

Saved data to the Sqlite DB:

image

Todos for DB:

  • Figure out relations. SQLite does not support foreign keys (the EF driver doesn't), for now, I've simply removed the AddForeignKey statements in the migration. These'll probably have to be redone.
  • Create migration for MSSQL
  • It seems there's a change in the model. PackageKey (fake property) on PackageDependency has changed from an int? to int.

LordMike avatar Dec 17 '18 22:12 LordMike

This is looking great! Some comments:

  1. The database migration is a little scary - once we add that in, we can't go back without breaking users that have migrated. Could we get merge the decompilation piece separately from the storage pieces?
  2. Let's keep the decompilation separate from BaGet.Core. Can we keep this in a separate project named BaGet.Decompilation or something? I'm not sure how Entity Framework's models will fit in, but we can figure this out later!

I'd like to prototype running Roslyn analyzers on decompiled sources to find bugs in NuGet packages (incorrect async usage, etc...). This would be a great addition to NuGet Package Explorer's package analysis feature, so we should aim to make this as reusable as possible. Thanks for the fantastic work! :)

loic-sharma avatar Dec 18 '18 02:12 loic-sharma

As I mentioned, I tried the Baget.Decompilation route, but quickly realized this thing with the entities. So to not duplicate all those (one for DB stuff and one for Decompilation), I chose this route.

However. It may be that there should be a DB entity and a not-db-entity. So that the DB entity can include relationships needed for the DB, while the Decompilation entity can include only decompilation-related stuff. Then it's a non-issue to move out decompilation (again :)); at the cost of mapping between these two entity types.

How'd that sound?

LordMike avatar Dec 18 '18 22:12 LordMike

Also. There is a slight space issue. Having uploaded one NEST assembly (3.400 public types), the Sqlite DB grew to 24 MB... Soo.. that's a lot of space within a short timespan.

LordMike avatar Dec 18 '18 22:12 LordMike

The mapping between decompilation objects and database entities sounds good to me. I’m wondering if maybe we should only store type and method information in the database? Would that reduce storage costs if we cut out the C#? We’d have to regenerate the C# code everytime it is requested.

I’ll be out for the holidays with no internet, so I won’t be able to review until the end of next week.

loic-sharma avatar Dec 19 '18 16:12 loic-sharma

Decompiling on the fly is a no go... The packages would have to be downloaded for each view.. A few options:

  • Compressing the C# code in the DB (it's not for searching anyways)
  • Decompiling the C# later on (decompile all types in one go, when any type is wanted, store the results in DB)
  • Store the C# elsewhere (compressed json f.ex., perhaps on the same storage as packages, alternatively on a separate storage)

LordMike avatar Dec 19 '18 18:12 LordMike

Note for myself, look into https://github.com/KirillOsenkov/SourceBrowser

loic-sharma avatar Dec 28 '18 00:12 loic-sharma

That could be cool - dump the members for searching purposes, then use this for display.

LordMike avatar Dec 28 '18 19:12 LordMike

That's what I was thinking too! Could we IM on gitter about this feature? If so, when would be a good time for you? I wanna make sure we're on the same page 😄

loic-sharma avatar Dec 29 '18 00:12 loic-sharma