pydfs-lineup-optimizer Ideas about performance when generating large numbers of lineups

Ideas about performance when generating large numbers of lineups

Open sansbacon opened this issue 3 years ago • 4 comments

It does appear that performance suffers as the number of lineups grows, particularly when there are a number of rules about ownership, combinations of players, etc.

I was testing out a prototype idea and it could make sense to have a different approach for multi-lineup generation. My idea is to break the process into two steps: (1) generate a large number of valid lineups and (2) optimize on the sum of the lineup scores with a constraint of n_lineups_wanted. Any rules that apply to all lineups can be applied at step one, such as positional constraints, salary constraints, stacking, etc. Any rules that apply to the pool of lineups can be applied efficiently at step two, such as ownership constraints on a single player or combination of players.

Here is a prototype for the idea. I was testing it out representing lineups as an array of integers which correspond to the index of a players dataframe. Obviously would need to be adapted to fit into your library but I think the concept has promise.

import numpy as np
import pandas as pd
import pulp

# Initialize variables
lineup_size = 9
player_pool = np.arange(200)
n_original_lineups = 10000
n_optimized_lineups = 150

# STEP ONE: generate lots of lineups
lineups = np.array([np.random.choice(player_pool, lineup_size, replace=False) for _ in np.arange(n_original_lineups)])

# STEP TWO: optimize the lineups
# the fastest way to do this is to "one hot encode" lineups
# create array of shape (n_original_lineups, player_pool)
# value is 1 if player is in lineup, 0 if not
# row indices correspond to indices in original lineups array
# column indices correspond to indices of player pool
lineups_ohe = (player_pool[...,None]==lineups[:,None,:]).any(-1).astype(int)

# these are the other values you need (creating fake data in this example)
# you would extract them from the lineups in a real example
min_ownership = np.random.binomial(1, .1, size=player_pool.size)
max_ownership = np.ceil(np.random.uniform(0, .4, size=player_pool.size) * n_optimized_lineups)
lineup_scores = np.random.uniform(100, 150, size=lineups_ohe.shape[0])

# now optimize the lineups
prob = pulp.LpProblem('MultiLineupOptimization', pulp.LpMaximize)

# create lineup variables
# key is Lineup_{id}, value is pulp.LpVariable
lineup_vars = pulp.LpVariable.dicts('Lineup', range(lineups.shape[0]), cat='Binary')

# objective function - sum of lineup scores
prob += pulp.lpSum([pts * lineup for lineup, pts in zip(lineup_vars.values(), lineup_scores)])

# constraint: only want n_optimized_lineups lineups
prob += pulp.lpSum(lineup_vars.values()) == n_optimized_lineups

# now add ownership constraints
# loop through each player
# ensure that the sum of the player vars is less than the ownership constraints
for player, max_own, min_own in zip(player_pool, max_ownership, min_ownership): 
    prob += pulp.lpSum([l[player] * luv for l, luv in zip(lineups_ohe, lineup_vars.values()) if l[player]]) <= max_own
    prob += pulp.lpSum([l[player] * luv for l, luv in zip(lineups_ohe, lineup_vars.values()) if l[player]]) >= min_own

# get lineups in solution
chosen = [v.name for v in prob.variables() if v.varValue == 1]

I don't understand the internals of pydfs-lineup-optimizer quite well enough to submit a pull request with this specific implementation. Just wanted to share my thinking about a possible way around performance issues for multi-lineup generation.

May 07 '21 01:05 sansbacon

How much result from your implementation differs from the current implementation? As I can see you generate many random lineups not using the library and this method does not guarantee that lineups will be the best optimized. Also I'm not sure that all features can be migrated to this approach. According to performance, I have plans in backlog to add another solver to the library that should work faster according to their benchmark.

May 11 '21 18:05 DimaKudosh

I thought about this some more and I can explain it better. The basic premise is that individual lineup optimization is a different problem than multi-lineup optimization.

There are certain objectives and constraints that apply at the lineup level, such as salary and positional requirements. Then there are other objectives and constraints that apply at the lineup group level, such as ownership % of players. And there are constraints that could be applied at the lineup or lineup group level, such as stacking. You could say "every lineup has to have a stack" or 60% of the lineups have to contain a stack.

Step 1 would be for the user to generate a list of lineups using the existing optimizer.optimize method. The user could generate the 500 most optimal lineups using 1 set of projections, 50 most optimal lineups using 10 different sets of projections. The only requirement is that the user create a list of Lineup objects. In the example below, I am using different randomness parameters to generate different sets of lineups, but I could also use different projections as well.

import numpy as np

# possible example of step one
# generate 600 lineups
# I think this is the correct way to randomize now, but please correct me if wrong
lineups = []
for randomness in np.arange(.01, .30, .01):
    optimizer.set_fantasy_points_strategy(RandomFantasyPointsStrategy(.00, randomness))
    for lineup in optimizer.optimize(20):
        lineups.append(lineup)

Then at step two you would have a MultiLineupOptimizer object (or some method of the existing optimizer) that handles the lineup group optimization by selecting the n best lineups that satisfy the lineup group level constraints, such as ownership.

So the advantage of this approach is that it allows you to mix and match lineups together and then optimize. Say I have 3 projection sources that I like, or different scenarios of how the slate could go. Dividing these concepts in two would facilitate multi-lineup optimization. One disadvantage is that users likely would have to do more work to get the optimizer set up correctly. It could also require too much change to the internals of the library to be feasible.

So I hope that clarifies my idea. I wanted to provide some food for thought based on my experience using optimizers and the questions people have asked me about the library. I looked at the internals of your library and I think this is a different approach than what you currently use, but please accept my apologies if this is, in essence, what you are already doing.

Thank you again for your hard work on this impressive library. Eric.

May 11 '21 19:05 sansbacon

The problem is that from a performance point of view this method will work slower than the current implementation because almost all constraints should be applied on a single lineup level, only max exposure can be moved to lineups group level and in your method you need to generate more lineups, generating of new lineup works slower than previous.

If implement this method as a feature for selecting most optimal lineups with exposures from provided lineups it can work but it will have some corner cases. For example, if you have a high projection player and it was included in all lineups in the previous stage then you try select best lineups group from all lineups generating on the previous stage and set exposure for this player that it should appear only on 50% lineups but you don't have another 50% lineups without this player and in this case this problem can't be solved.

May 12 '21 18:05 DimaKudosh

This sounds like it could help improve inter-lineup diversity.

Jul 28 '22 19:07 BenikaH

pydfs-lineup-optimizer pydfs-lineup-optimizer copied to clipboard

Ideas about performance when generating large numbers of lineups

pydfs-lineup-optimizer
pydfs-lineup-optimizer copied to clipboard