stash icon indicating copy to clipboard operation
stash copied to clipboard

Performer disambiguation and aliases

Open WithoutPants opened this issue 3 years ago • 13 comments

Includes refactoring of the performer object model to use relationship fields in the same way as scenes etc.

Adds disambiguation field. This field is shown in parentheses next to the performer title. This field can be scraped.

image

Converts the Aliases field to be a list of strings, in the same way as tag aliases. Scraped aliases are assumed to be comma-delimited and are split accordingly. The migration splits aliases in the same way.

image

While performer name + disambiguation must be unique, aliases are not enforced as such.

Removes the checksum field.

Updates the auto tag logic to take aliases into account. Disambiguation is not currently used for auto-tagging.

Aliases are included when filtering with the performer selector, but only show (alias) suffix if the alias matches.

image

Resolves #2507 Resolves #1724 (I think) Resolves #1162 Resolves #464

WithoutPants avatar Nov 10 '22 05:11 WithoutPants

Two things.

I. Sorting. It's easier to parse aliases when they're case insensitive alphanumerically sorted. After each save of the performer profile automatically sort the list of aliases. This would also apply to the migration process when converting the comma delimited string of aliases into a list.

II. Bulk adding. The + button is fine for a handful of aliases, but it's not uncommon for some performers to have more than a handful of aliases. Entering every alias for these type of performers one line at a time would be tedious.

Possible solutions:

A. Add another button adjacent to the + button that displays a bulk add input form for a comma delimited list.

image

B. Pressing <enter> or <shift>+<enter> (if a single carriage return would cause too many false positives) in the alias input form automatically displays the next alias input form and moves the cursor to this new input form.

echo6ix avatar Nov 10 '22 08:11 echo6ix

There seems to be an issue when scraping a performer's alias (the existing aliases are not displayed)

Example Performer has aliases 2022-11-10_20-26

When scraping no existing aliases are listed 2022-11-10_20-25

bnkai avatar Nov 10 '22 18:11 bnkai

I. Sorting. It's easier to parse aliases when they're case insensitive alphanumerically sorted. After each save of the performer profile automatically sort the list of aliases. This would also apply to the migration process when converting the comma delimited string of aliases into a list.

Thanks for the feedback. Performer aliases are now sorted in the edit panel.

II. Bulk adding. The + button is fine for a handful of aliases, but it's not uncommon for some performers to have more than a handful of aliases. Entering every alias for these type of performers one line at a time would be tedious.

Possible solutions:

A. Add another button adjacent to the + button that displays a bulk add input form for a comma delimited list. B. Pressing <enter> or <shift>+<enter> (if a single carriage return would cause too many false positives) in the alias input form automatically displays the next alias input form and moves the cursor to this new input form.

Thanks for the suggestions. I've gone with an approach similar to solution B - please let me know what you think.

The string list input now always shows an empty text field at the end. When you start typing into the field, it automatically adds it to the list and creates a new empty field to tab into. The add button has been removed.

image image

If you clear the last field, then it is removed from the list. Clearing a field that's not at the end will show an error feedback as per existing behaviour.

This should streamline the approach somewhat. Entering aliases involves entering a value, pressing tab twice, entering the next value and so on. It's not as streamlined as pasting in a comma-delimited list, but it is an improvement.

WithoutPants avatar Nov 15 '22 05:11 WithoutPants

@WithoutPants Definitely a big improvement 👍

echo6ix avatar Nov 15 '22 06:11 echo6ix

This is a huge improvement overall. It makes the performer aliases actually useful.

One minor issue is that when scraping/searching in the tagger view, the result doesn't automatically resolve names based on alias, you have to manually search. See example below - performer wasn't automatically resolved but can be found by manually searching.

image

WeedLordVegeta420 avatar Nov 21 '22 15:11 WeedLordVegeta420

I'm seeing an error while migrating in this branch:

error performing migration: UNIQUE constraint failed: performers.name in line 0: CREATE TABLE `performer_aliases` (
  `performer_id` integer NOT NULL,
  `alias` varchar(255) NOT NULL,
  foreign key(`performer_id`) references `performers`(`id`) on delete CASCADE,
  PRIMARY KEY(`performer_id`, `alias`)
);

CREATE INDEX `performer_aliases_alias` on `performer_aliases` (`alias`);

-- `performers`.`aliases` will be dropped in the post-migration

DROP INDEX `performers_checksum_unique`;
ALTER TABLE `performers` DROP COLUMN `checksum`;
ALTER TABLE `performers` ADD COLUMN `disambiguation` varchar(255);

CREATE UNIQUE INDEX `performers_name_disambiguation_unique` on `performers` (`name`, `disambiguation`) WHERE `disambiguation` IS NOT NULL;
CREATE UNIQUE INDEX `performers_name_unique` on `performers` (`name`) WHERE `disambiguation` IS NULL;

kermieisinthehouse avatar Nov 21 '22 18:11 kermieisinthehouse

Above error is from having duplicate named performers, tied to https://github.com/stashapp/stash/issues/2344 .

There is another issue: filtering performers using the new "aliases" field throws a sql error.

kermieisinthehouse avatar Nov 21 '22 21:11 kermieisinthehouse

There is another issue: filtering performers using the new "aliases" field throws a sql error.

I see the same behavior. Looks like it's trying to do a join on the tags table instead of the performers table.

error executing count query with SQL: SELECT COUNT(*) as count FROM (SELECT DISTINCT performers.id FROM performers LEFT JOIN performer_aliases ON performer_aliases.performer_id = tags.id WHERE ((performer_aliases.alias LIKE ? OR performer_aliases.alias LIKE ?))) as temp, args: [%Arisa% %Takarada%], error: error executing `SELECT COUNT(*) as count FROM (SELECT DISTINCT performers.id FROM performers LEFT JOIN performer_aliases ON performer_aliases.performer_id = tags.id WHERE ((performer_aliases.alias LIKE ? OR performer_aliases.alias LIKE ?))) as temp` [[%Arisa% %Takarada%]]: no such column: tags.id

WeedLordVegeta420 avatar Nov 21 '22 23:11 WeedLordVegeta420

I think that the user should be able to mark aliases individually for "Do not autotag" - otherwise extremely common aliases like "Mary" can't be used.

kermieisinthehouse avatar Nov 21 '22 23:11 kermieisinthehouse

I think that the user should be able to mark aliases individually for "Do not autotag" - otherwise extremely common aliases like "Mary" can't be used.

This is a significant change and will mean bumping this from 0.18. The alternative is that we leave the aliases out of the auto-tag behaviour for now to address later.

WithoutPants avatar Nov 22 '22 00:11 WithoutPants

Yes, I don't want to hold up the release for this - I think maintaining the current autotag behavior is probably a good middle ground, if that's an easy change

kermieisinthehouse avatar Nov 22 '22 00:11 kermieisinthehouse

I do think that leaving the aliases out of the auto-tag for now is probably the better move until there is an option to exclude them individually from auto-tag. Otherwise, people will inevitably upgrade to the latest release and run auto-tag without realizing the impact those one name aliases will have.

WeedLordVegeta420 avatar Nov 22 '22 00:11 WeedLordVegeta420

I can confirm that the most recent commits fix the duplicate performer issue!

kermieisinthehouse avatar Nov 22 '22 00:11 kermieisinthehouse

$150 bounty placed (contribution no. 624029)

WithoutPants avatar Feb 14 '23 03:02 WithoutPants