ShokoServer icon indicating copy to clipboard operation
ShokoServer copied to clipboard

Add Configurations, Hashing & Release Info Providers [WIP]

Open revam opened this issue 10 months ago • 7 comments

PoC adding configurations, hashing & release providers. Still things left to test, but it's overall in a working state, minus the still missing MySQL/MariaDB & MS SQL Server support.

Changes internally:

  • Added a new IPluginManager interface for getting information about enabled plugins. Can be extended in the future to support disabled plugins, and/or be made to enable/disable/install/update/uninstall plugins. For now I just needed a way to list information about the installed plugins, so the rest can come later if needed.

  • Added a new IConfigurationService, IConfiguration, IConfigurationDefinition ConfigurationProvider<TConfig>, for plugins (and the core) to create, load, save and validate configurations, in addition to creating a JSON schema (with extensions) to generate a UI for the config in the web ui. The existing ISettingsProvider have been moved to use the new service, and the built-in providers have their own settings served through this service.

  • Added the ability to load a setting from an environment variable as long as it's marked with the new EnvironmentVariableAttribute(string name), and track which of them are loaded globally and on a per-configuration basis. We can also lock the setting so it can't be changed when the env. var. is loaded (the default behavior), or allow it to be changed but load the initial value on every startup from the environment.

  • Added the ability to indicate a setting needs a restart to take effect by marking it with the new RestartRequiredAttribute, and track if a server restart is needed because one or more changed, on a per-configuration, per setting basis. Complete with events and properties to track it globally and on a per configuration basis.

  • Added a new IVideoHashingService, IHashProvider, IHashDigest, HashDigest to the plugin abstraction. The new IVideoHashingService operates on raw System.IO.FileInfos and is responsible for providing hashes to the HashFileJob before a IVideo & IVideoFile is necessarily assigned to a file location. So far there are runtime checks in place to make sure at least 1 "ED2K" hasher is enabled at all times, since we still rely on it as our absolute ID (in combination with the video file size) internally, but the hasher doesn't necessarily need to be provided by the new "Core" hasher. It contains events for when a IVideo & IVideoFile has been hashed (and added to the system), and when providers have been updated. The service can be used to switch between sequential mode and parallel mode — which controls how plugin providers are called, view all available and/or enabled hash types, enable or disable hash types per provider, and re-order the run priority of providers in sequential mode.

    Note: The priority doesn't affect the parallel mode because every provider is… ran in parallel.

  • Added a new "Core" hasher (CoreHashProvider) implementing the "ED2K", "MD5", "CRC32", "SHA1", "SHA256", & "SHA512" hash types, with the "ED2K" enabled by default.

  • Added a new IVideoReleaseService, IReleaseInfoProvider, IReleaseInfo, IReleaseVideoCrossReference, IReleaseMediaInfo, VideoReleaseEventArgs to the plugin abstraction. The new IVideoReleaseService is responsible for everything release to release info, be it managing providers, doing the auto-search across multiple providers, showing provider info, saving release info to the database, and clearing saved release info from the database. It also contains events for when a release has been saved or cleared, when a auto-search has been started/completed, and when providers have been updated. The service can be used to switch between sequential mode — running each provider in a loop in priority order until we find a match or exhaust the list — or parallel mode — running all providers in parallel and selecting the highest valid priority release, view all providers, enable or disable providers, re-order the priority of the providers.

  • Added a new "AniDB" release info provider (AnidbReleaseProvider), hooking into the existing AniDB UDP lookup logic. As a side-effect of the change in the lookup process have the MyList support in the existing UDP lookup logic has been stripped out, and we now rely entirely on the IVideoReleaseService and MyList sync job to add new files and/or or pull watched state from the MyList.

  • Added a IHashProvider<TConfig> interface to create an explicit relation between a hash provider and a configuration. This information is also available to plugins on the HashProviderInfo class retrived by the IVideoHashingService and to RESTful clients in the APIv3 (e.g. for the web ui to act on the information).

  • Added a IReleaseInfoProvider<TConfig> in the same way as the IHashProvider<TConfig>, but for release info providers.

  • Added IReadOnlyList<IHashDigest> Hashes, string? SHA256, string? SHA512 to IHashes, to list all hashes stored for a IVideo that may not necessarily by strongly typed and to expose all hash types supported by the CoreHashProvider ("Core" in the UI) as strongly typed hashes. The existing strongly typed hash types have been converted to helpers; retrieving the first stored hash from the list of the given type.

  • AniDB_File, AniDB_FileUpdate, AniDB_ReleaseGroup, CrossRef_Languages_AniDB_File, & CrossRef_Subtitles_AniDB_File models/tables/repos are gone, and their functionality replaced by the new StoredReleaseInfo & StoredReleaseInfo_MatchAttempt. The AniDB file has also been removed from the abstraction.

  • Video file hashes — except the "ED2K" hash — has been moved to only being stored in the new VideoLocal_HashDigest table, but the "ED2K" is still stored on the video itself in addition to the new table.

  • Currently I've assigned every existing link as a "manual link", because the user is now able to edit every link we store if they so desire, and this was the simplest way to show all the links in the current Web UI.

  • Added a new plugin to simply export/import release info (Shoko.Plugin.ReleaseExporter). This is both my test case for the plugin system in addition to a small handy provider if you ever need to re-index your collection from scratch and don't want to do the AniDB UDP dance, or if you want to transcode your collection to a newer format at some point and want to preserve the release info in the process.

Changes in APIv1:

  • All file linking/unlinking in APIv1 has been soft deprecated. Use APIv3 instead. By soft deprecated I mean the client can still make the requests, but will only get an error message back from the server.

  • Release info has been migrated to use the new format, but only for releases provided by the "AniDB" provider.

  • Release groups have been migrated to use the new format, but only for release groups with "AniDB" as a source.

Changes in APIv2:

  • Release info has been migrated to use the new format, but only for releases provided by the "AniDB" provider.

  • Release groups have been migrated to use the new format, but only for release groups with "AniDB" as a source.

Changes in APIv3:

  • Release info has been migrated to a new API model.

  • File.Hashes has been changed from a dict of well known, nullable hashes to a list of hash digests, where only the ED2K hash is guaranteed to be included in the list.

  • File.AniDB has been replaced with File.Release, which now uses the new release info model. The includeDataFrom=AniDB query parameter for file endpoints

  • Added a new hashing controller (mounted at /api/v3/Hashing for now), to view and edit hashing provider settings, enable disable hash types per provider, and re-order the run order of providers in sequential mode (note: the order doesn't affect the parallel mode because every provider is… ran in parallel).

  • Added a new ReleaseInfoController (mounted at /api/v3/ReleaseInfo), allowing RESTful clients to also interact with the newly added IVideoReleaseService. You can do anything you can do

  • File linking in APIv3 have been converted to use the new service, and as a result the artificial limit of not allowing the user to remove AniDB releases is gone. A release is simply a release now.

To-do;

  • [X] Add missing MySQL/MariaDB database migrations.
  • [x] Add missing MS SQL Server database migrations.
  • [X] Test that the anidb provider somehow works as it should.
  • [x] Test out that MyList is still working as it should.
  • [x] Test out adding a workflow to edit the providers in the web ui.
  • [x] Fix breakage in the web ui as a result of the removal of the anidb property on the file model.
  • [x] Fix breakage in Shokofin as a result of the removal of the anidb property on the file model.

revam avatar Feb 24 '25 00:02 revam

This is...a lot, so I'll need to look at it more later. One thing I see first off is the complication and kind of hacky handling of the scheduling and ProcessFile. I would split the jobs if possible, and make it so that it has a flow like so:

Discover
Hash
ProcessFile
    Check the state of the data and what providers can/need to update, prolly via an interface/abstraction
    Schedule the relevant jobs for each provider. AniDB will have one, which can be handled in scheduler via the exclusion types. Ashen might have one. Maybe an NFO one? It's extensible, after all.
Get Provider File Info
    The job mentioned before. It can do the job that Process File did and orchestrate other things. We can make helpers or a base "Get Provider File Info" in the abstractions for providers to extend.
...

The plugin abstractions might need to provide a hook to add Acquisition Filters, jobs, etc

da3dsoul avatar Feb 24 '25 04:02 da3dsoul

@Cazzar can you comment on some of the design? We aren't nitpicking code quality yet.

though if we were, stop making constructors for models. It'll mess up Entity Framework, and I'm going to get rid of them anyway. Use object initializers. i.e.

public StoredReleaseInfo(IVideo video, IReleaseInfo releaseInfo)
...

Models should be models. If processing needs to be done, it should be in a service/factory. I don't know if your "embedded" models will work. We will see. I'm not sure how Entity Framework will handle loading of relationships through them.

Stuff like this is fine imo, though:

public IReadOnlyList<IReleaseVideoCrossReference> CrossReferences
    {
        get => EmbeddedCrossReferences
            .Split(',')
            .Select(EmbeddedCrossReference.FromString)
            .WhereNotNull()
            .ToList();
        set => EmbeddedCrossReferences = value
            .Select(x => x.ToEmbeddedString())
            .Join(',');
    }

da3dsoul avatar Feb 24 '25 04:02 da3dsoul

Overall design I like the idea, I haven't looked through it extensively as the large amount of changes does make things complex.

Cazzar avatar Feb 24 '25 09:02 Cazzar

This is...a lot, so I'll need to look at it more later. One thing I see first off is the complication and kind of hacky handling of the scheduling and ProcessFile. I would split the jobs if possible, and make it so that it has a flow like so:

Discover
Hash
ProcessFile
    Check the state of the data and what providers can/need to update, prolly via an interface/abstraction
    Schedule the relevant jobs for each provider. AniDB will have one, which can be handled in scheduler via the exclusion types. Ashen might have one. Maybe an NFO one? It's extensible, after all.
Get Provider File Info
    The job mentioned before. It can do the job that Process File did and orchestrate other things. We can make helpers or a base "Get Provider File Info" in the abstractions for providers to extend.
...

The plugin abstractions might need to provide a hook to add Acquisition Filters, jobs, etc

The current service can be ran inside or outside the queue/job system, and in the current PoC then the release providers are processed in a user-configurable order until a release is found. Only a single release can be assigned to the same video at any given time, so scheduling "relevant jobs for each provider" won't ever happen in parallel. In short, the logic to select a release happens strictly inside the service and the process-file job is now just asking the service to do it's thing while running in the queue/job system in the background. There are also other ways to interact with the service, be it from other plugins through the abstraction, or from RESTful clients through the new endpoints.

I do admit that the way I modified the AniDB banned acquisition filter is kind of hacky, but also correct, as it was modified to only block the process-file jobs if AniDB banned ONLY IF the AniDB release provider is enabled, as it needs to be able to use the AniDB UDP API. But flipping that would mean that as long as the AniDB provider isn't enabled then we don't need to block the process-file jobs at all, since it's not using the AniDB UDP API to find releases.

revam avatar Feb 24 '25 20:02 revam

That is a reasonable argument, but I would allow multiple so that you can cross reference them. AniDB is more likely correct than perceptual hashing, even though perceptual hashing is pretty accurate. We could even have a filename plugin with very low accuracy. Maybe add an enum for how trustworthy we expect a provider to be in those cases.

da3dsoul avatar Feb 24 '25 20:02 da3dsoul

We kinda already have a user configurable "priority" to use. Can we add two modes,

  • one mode to run in a sync. loop in the priority order until a release is found (AKA the current way), and
  • one mode to run multiple providers in parallel, await all the responses, and then pick the highest priority release of the available candidates?

I can add the new setting to the service and modify the description of the auto-finding method and endpoints to reflect the new behavior. The reason I would opt to have both modes is to let the user choose how they want to do it. By default we will only have one provider included (…unless…), so the default mode can be whatever.

My particular flow would require the finding to happen in sync. order; it would first checks the "nfo" (quotes intentional) file before asking any remote services (or a potential fallback local/offline provider). But I know some would maybe want to do it in a parallel fashion as you described with the p-hash and AniDB provider, so I'll say "let them pick their own poison to swallow."

revam avatar Feb 24 '25 20:02 revam

Quality Gate Failed Quality Gate failed

Failed conditions
36 Security Hotspots
C Security Rating on New Code (required ≥ A)
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

sonarqubecloud[bot] avatar Sep 07 '25 19:09 sonarqubecloud[bot]

Quality Gate Failed Quality Gate failed

Failed conditions
36 Security Hotspots
C Security Rating on New Code (required ≥ A)
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

sonarqubecloud[bot] avatar Oct 12 '25 00:10 sonarqubecloud[bot]

Quality Gate Failed Quality Gate failed

Failed conditions
28 Security Hotspots
E Reliability Rating on New Code (required ≥ A)
C Security Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

sonarqubecloud[bot] avatar Nov 23 '25 06:11 sonarqubecloud[bot]