ggplot2-solutions
ggplot2-solutions copied to clipboard
10.5.1 #3 The most consistently good baseball batter
The exercise is know
Which player in the Batting dataset has had the most consistently good performance over the course of their career?
https://github.com/kangnade/ggplot2-solutions/blob/e6ef9e3b271599f6e97afc0f8a3f012f276f9385/ggplot2_solutions_chapter10.Rmd#L399
Having no field background knowledge, I can only hint a technique:
Lahman::Batting %>% group_by(yearID) %>%
After
- grouping per year,
- use the
mutate()
to compute composite measures of performance out of the existing variables in the data set. Then, - sum up the yearly variables of performance you created into a single number, for example: use the mean of the player yearly - - values for a given variable as a single number outcome of this variable,
- do the same for all variables,
- multiply the single number outcome of each variable as a final single number measurement of success.
You might say that we should weigh this value to the years the player played: the longer the player maintained a good performance, the more
consistently good performance
he had, which is what the question wants us to measure.
To account for years, consider multiplying the single number measurement of success you obtained * the number of years the player played. This is the final score of the batter, which indicates how consistently good he was, on a yearly basis, for how long.
Assuming that the composite variables you mutated as measure of success are valid, the player with least score is the most consistently high performance player.
It will be interesting to see if this player is commonly said to be a legend by most fans. Unfortunately, I can't generate reliable measures of success, so I won't do this analysis.
To anyone reading these lines, I'll be glad if you used your background baseball knowledge, conducted this analysis & sent me the final list of legends on [email protected] .
Warmly