futurecoder
futurecoder copied to clipboard
Analysing user submissions data
Related: #134
Here is 3 months worth of data downloaded from firebase and then transformed by a script on my computer. Each row represents a user running code once. It doesn't include the submitted code or output. I'd like some code analysing this data to extract whatever useful information I can get. Here's a start:
import pandas as pd
df = pd.read_csv("/home/alex/Downloads/futurecoder-io-default-rtdb-code_entries-cleaned.csv")
for col in "source page_slug developerMode page_route num_hints requesting_solution".split():
print(col)
print(df[col].value_counts())
print()
Output:
source
editor 201952
shell 62834
snoop 6303
pythontutor 3060
birdseye 1982
Name: source, dtype: int64
page_slug
BuildingUpStringsExercises 38738
IntroducingTheShell 37634
BuildingUpStrings 14496
UsingVariables 13992
IfAndElse 12407
IntroducingVariables 8958
GettingElementsAtPosition 8188
AddingStrings 7888
IntroducingNestedLoops 7176
BasicForLoopExercises 7154
OtherComparisonOperators 6390
Indentation 6277
IntroducingIfStatements 6152
LoopingOverNestedLists 6130
IntroducingLists 5145
FunctionsAndMethodsForLists 5048
TheFullTicTacToeGame 4848
StoringCalculationsInVariables 4667
IntroducingElif 4455
BuildingNewLists 4330
IntroducingTicTacToe 4312
IntroducingStrings 4167
UsingBreak 4118
WritingPrograms 3799
IntroducingForLoops 3234
TestingFunctions 3198
MoreListFunctionsAndMethods 3166
IntroducingOr 3095
TheEqualityOperator 3094
NewlinesAndFormatBoard 2419
GettingElementsAtPositionExercises 2287
CallingFunctionsTerminology 2132
IntroducingNestedLists 2110
NavigatingShellHistory 2093
IntroducingFstrings 2092
UnderstandingProgramsWithSnoop 2055
Types 1881
ModifyingWhileIterating 1565
DefiningFunctions 1511
CombiningCompoundStatements 1371
HowToFindInformationWithGoogleAndMore 1361
SingleAndDoubleQuotesInStrings 1154
StringMethodsUnderstandingMutation 1060
ReturningValuesFromFunctions 975
IntroducingAnd 962
NestedListAssignment 854
IntroducingNotPage 818
CombiningAndAndOr 812
EqualsVsIs 745
IntroducingBirdseye 582
MoreOnReturn 566
CallingFunctionsWithinFunctions 513
MakingTheBoard 467
BasicTerminology 414
InteractiveProgramsWithInput 395
UnderstandingProgramsWithPythonTutor 346
MultiLineExpressions 331
loading_placeholder 4
Name: page_slug, dtype: int64
# this shows whether developer mode was on when they ran the code,
# not whether they used developer mode to skip a step
developerMode
False 264081
True 12050
Name: developerMode, dtype: int64
page_route
main 224303
ide 51364
question 464
Name: page_route, dtype: int64
num_hints
0 240328
4 7352
2 6393
1 4879
6 4081
3 3250
5 2510
10 1753
9 1627
8 829
20 660
11 566
7 494
12 432
31 262
15 215
14 206
13 204
16 33
17 26
19 19
18 11
27 1
Name: num_hints, dtype: int64
requesting_solution
0 259132
4 12241 # revealing the hidden solution
2 4266 # looking at the shuffled solution
# 1 and 3 are when the popup says "Are you sure?"
3 392
1 100
Name: requesting_solution, dtype: int64
The first thing any such code needs to do is identify which users seem to be actual students seriously trying to complete the course. Submitting code to many pages is probably a good indicator.
There's many questions that could be asked about the data. In particular, how many users complete all or most of the course? Where do users struggle and/or stop entirely?
Unfortunately the data doesn't include whether or not an entry passed the current step - I accidentally stopped storing that in https://github.com/alexmojaki/futurecoder/commit/b6f19e84497b8e8ba2e3f5c57cf6f00c331784cd. But the distribution of page_slug and step_name for each user should work well.