cricketr
cricketr copied to clipboard
Issue in batsmanDismissals()
First of all , I would like to appreciate your dedication in framing this package!. Hats Off!
Coming to the issues, I have been trying to analyze Vijay Shankar's ODI dismissals. (http://stats.espncricinfo.com/ci/engine/player/477021.html?class=2;filter=advanced;orderby=start;template=results;type=batting;view=innings). and I came across 2 following issues:
(I have traced the function)
- batsman <- clean(file) .
After execution of this line only one record stays in his data
Similarly with Rishab Pant
- If I manually remove the above line (batsman <- clean(file) and continue execution line by line, It looks like there is an issue that might occur with any player i.e The individual dismissal type % is calculated with denominator containing total no of innings played rather than total number of times dismissed
What I mean is that
His stats should read
Not out : 0 %
Run out : 40 %
Caught : 60 %
Refer
as opposed to current metrics displaying
Not out : 44%
Run out : 22%
Caught : 33%
Will check. Not out shows up as a '*" which I remove. Yes transformations have to be done. Will look into it. Currently caught up in a couple of things. We cannot remove clean(file). We have to make it work with that.
Ganesh
Sure Mr Ganesh. I Will keep an eye on this page.
Pranav
Looking at the data I see rows which have been removed have Mins as '-'. This is NA which R removes in clean(file). Did you check why rows 6,7,8,9 for Vijay has Mins as '-'?
ESPNCricinfo (Match Scorecard) doesn't have the minutes played statistics for India's home series vs Australia
Looking at the data I see rows which have been removed have Mins as '-'. This is NA which R removes in clean(file). Did you check why rows 6,7,8,9 for Vijay has Mins as '-'?
I can confirm that the issue exists only for players where there is "-" in "Mins" column for the innings they have batted.
Can't the clean function be executed without considering the Mins column?
Like replacing first line in clean function with
df <- read.csv(file, stringsAsFactor = FALSE)
df = df[c(-3)]
This works fine for me.
But have to check whether this holds good for other batsman functions too
You can make your own function to only look at the dismissals column without the clean function. I may not add this to the package as this is an issue with the data. I cannot keep the package generic if I do changes which are unique.
What I actually thought was , given the fact that "Mins" data is inconsistent in ESPNCricinfo Statsguru, why do we need to consider "Mins" data at all. Why dont we drop it for all functions? Then it becomes easier to fill NAs for all other "-"s.