data_science_in_julia_for_hackers Chapter 3 Julian code feedback

Chapter 3 Julian code feedback

Open 0xpantera opened this issue 3 years ago • 1 comments

Went through chapter 3 and changed some of the code to make it more "Julian". Most of the changes I made are in the bandits part, but I also made a small change to the code that reads the rainfall data:

begin
    #Read the CSV file and transform it into a DataFrame
    rain_data = CSV.read("data/historico_precipitaciones.csv", DataFrame)
    
    #Rename the columns
    colnames = ["Year", "Month", "mm", "Days"]
    #Symbol is the type of object used to represent the labels of a dataset
    rename!(rain_data, Symbol.(colnames))
    
    #We use a dictionary to translate de Month names
    translate = Dict(
        "Enero" => "January",
        "Febrero" => "February",
        "Marzo" => "March",
        "Abril" => "April",
        "Mayo" => "May",
        "Junio" => "June",
        "Julio" => "July",
        "Agosto" => "August",
        "Septiembre" => "September",
        "Octubre" => "October",
        "Noviembre" => "November",
        "Diciembre" => "December"
    )
    
    replace!(rain_data.Month, translate...)
end

The main change here is replacing the for loop that was making the replacement in the Month column.

I changed the definition of the beta_bandit struct to BetaBandit as it's the Julian way to use UpperCamelCase to name structs (see style guide in the docs. Additionally I used Base.@kwdef to have default values for the BetaBandit struct. This goes in line with the following statement in the Constructors section of the Julia Docs

It is good practice to provide as few inner constructor methods as possible: only those taking all arguments explicitly and enforcing essential error checking and transformation. Additional convenience constructor methods, supplying default values or auxiliary transformations, should be provided as outer constructors that call the inner constructors to do the heavy lifting. This separation is typically quite natural.

# init with params a=1, b=1 (uniform distribution)
Base.@kwdef mutable struct BetaBandit
    # the real success probability of the bandit
    p::Float64
    # a parameter from the beta distribution (successes)
    a::Int64 = 1
    # b parameter from the beta distribution (fails)
    b::Int64 = 1
    # total number of trials of bandit
    N::Int64 = 0
end

Another way to do it would be to define an outside constructor for the struct, as such:

mutable struct BetaBandit
    # the real success probability of the bandit
    p::Float64
    # a parameter from the beta distribution (successes)
    a::Int64
    # b parameter from the beta distribution (fails)
    b::Int64
    # total number of trials of bandit
    N::Int64
end
BetaBandit(p) = BetaBandit(p, 1, 1, 0)

I modified the update_bandit function to update_bandit!. The Julia style guide suggests using ! at the end of method names that mutate their arguments. I also used the ternary operator to make the code more concise and Julian:

function update_bandit!(bandit::BetaBandit, outcome::Bool)
    outcome ? bandit.a += 1 : bandit.b += 1
    bandit.N += 1
end

For the rest of the bandit code I changed the following:

Refactored find_best_bandit into a standalone function (this makes the Thompson sampling algorithm more clear IMO)
N_TRIALS and BANDIT PROBABILITIES into const since they won't change.
Refactored plot_bandits into a standalone function. Again, I think this makes the Thompson algorithm easier to see.
Used short circuit evaluation to update the reward variable in a more Julian way.

Here's the rest of the code:

const N_TRIALS = 100
const BANDIT_PROBABILITIES = [0.3, 0.5, 0.75]

function find_best_bandit(bandits)
    _, mxidx = findmax([sample_bandit(bandit) for bandit in bandits])
    bandits[mxidx]
end

function beta_bandit_experiment(band_probs, trials)
    bandits = [BetaBandit(;p) for p in band_probs]
    reward = 0
    
    for i in 1:trials
        best_bandit = find_best_bandit(bandits)
        exp = pull_arm(best_bandit)
        exp && (reward += 1)
        update_bandit!(best_bandit, exp)
    end
    bandits
end

function plot_bandits(bandits)
    plot()
    for i in 1:length(bandits)
        display(
            plot!(
                Beta(bandits[i].a, bandits[i].b),
                xlim=(0, 1),
                xlabel="Success probability of bandit",
                ylabel="Probability density",
                lw=2
            )
        )
    end
    current()
end

Hope you find this helpful!

Jun 04 '21 18:06 0xpantera

@Francososa THANK you for taking the time to read the book and the feedback. We are in the middle of a big refactor. With @marianLambda @HermanObst and @pefontana we will discuss in the following weeks this. Thanks again!

Jun 14 '21 16:06 unbalancedparentheses

data_science_in_julia_for_hackers data_science_in_julia_for_hackers copied to clipboard

Chapter 3 Julian code feedback

data_science_in_julia_for_hackers
data_science_in_julia_for_hackers copied to clipboard