futurecoder icon indicating copy to clipboard operation
futurecoder copied to clipboard

Analysing user submissions data

Open alexmojaki opened this issue 3 years ago • 0 comments

Related: #134

Here is 3 months worth of data downloaded from firebase and then transformed by a script on my computer. Each row represents a user running code once. It doesn't include the submitted code or output. I'd like some code analysing this data to extract whatever useful information I can get. Here's a start:

import pandas as pd

df = pd.read_csv("/home/alex/Downloads/futurecoder-io-default-rtdb-code_entries-cleaned.csv")

for col in "source page_slug developerMode page_route num_hints requesting_solution".split():
    print(col)
    print(df[col].value_counts())
    print()

Output:

source
editor         201952
shell           62834
snoop            6303
pythontutor      3060
birdseye         1982
Name: source, dtype: int64

page_slug
BuildingUpStringsExercises               38738
IntroducingTheShell                      37634
BuildingUpStrings                        14496
UsingVariables                           13992
IfAndElse                                12407
IntroducingVariables                      8958
GettingElementsAtPosition                 8188
AddingStrings                             7888
IntroducingNestedLoops                    7176
BasicForLoopExercises                     7154
OtherComparisonOperators                  6390
Indentation                               6277
IntroducingIfStatements                   6152
LoopingOverNestedLists                    6130
IntroducingLists                          5145
FunctionsAndMethodsForLists               5048
TheFullTicTacToeGame                      4848
StoringCalculationsInVariables            4667
IntroducingElif                           4455
BuildingNewLists                          4330
IntroducingTicTacToe                      4312
IntroducingStrings                        4167
UsingBreak                                4118
WritingPrograms                           3799
IntroducingForLoops                       3234
TestingFunctions                          3198
MoreListFunctionsAndMethods               3166
IntroducingOr                             3095
TheEqualityOperator                       3094
NewlinesAndFormatBoard                    2419
GettingElementsAtPositionExercises        2287
CallingFunctionsTerminology               2132
IntroducingNestedLists                    2110
NavigatingShellHistory                    2093
IntroducingFstrings                       2092
UnderstandingProgramsWithSnoop            2055
Types                                     1881
ModifyingWhileIterating                   1565
DefiningFunctions                         1511
CombiningCompoundStatements               1371
HowToFindInformationWithGoogleAndMore     1361
SingleAndDoubleQuotesInStrings            1154
StringMethodsUnderstandingMutation        1060
ReturningValuesFromFunctions               975
IntroducingAnd                             962
NestedListAssignment                       854
IntroducingNotPage                         818
CombiningAndAndOr                          812
EqualsVsIs                                 745
IntroducingBirdseye                        582
MoreOnReturn                               566
CallingFunctionsWithinFunctions            513
MakingTheBoard                             467
BasicTerminology                           414
InteractiveProgramsWithInput               395
UnderstandingProgramsWithPythonTutor       346
MultiLineExpressions                       331
loading_placeholder                          4
Name: page_slug, dtype: int64

# this shows whether developer mode was on when they ran the code,
# not whether they used developer mode to skip a step
developerMode
False    264081
True      12050
Name: developerMode, dtype: int64

page_route
main        224303
ide          51364
question       464
Name: page_route, dtype: int64

num_hints
0     240328
4       7352
2       6393
1       4879
6       4081
3       3250
5       2510
10      1753
9       1627
8        829
20       660
11       566
7        494
12       432
31       262
15       215
14       206
13       204
16        33
17        26
19        19
18        11
27         1
Name: num_hints, dtype: int64

requesting_solution
0    259132
4     12241  # revealing the hidden solution
2      4266  # looking at the shuffled solution
# 1 and 3 are when the popup says "Are you sure?"
3       392
1       100
Name: requesting_solution, dtype: int64

The first thing any such code needs to do is identify which users seem to be actual students seriously trying to complete the course. Submitting code to many pages is probably a good indicator.

There's many questions that could be asked about the data. In particular, how many users complete all or most of the course? Where do users struggle and/or stop entirely?

Unfortunately the data doesn't include whether or not an entry passed the current step - I accidentally stopped storing that in https://github.com/alexmojaki/futurecoder/commit/b6f19e84497b8e8ba2e3f5c57cf6f00c331784cd. But the distribution of page_slug and step_name for each user should work well.

alexmojaki avatar Feb 25 '22 15:02 alexmojaki