ScoringEngineDemo.jl icon indicating copy to clipboard operation
ScoringEngineDemo.jl copied to clipboard

ERROR: LoadError: UndefVarError: ScoringEngineDemo not defined

Open AbhimanyuAryan opened this issue 3 years ago • 21 comments
trafficstars

Hi Jeremy,

I am re-building the ScoringEngineDemo with GenieBuidler. Here's the link to repo. While restructuring the entire app I have come across this issue in line

const preproc_flux = BSON.load(joinpath(assets_path, "preproc-flux.bson"), ScoringEngines)[:preproc]

so. I have moved the logic of exporting methods/functions to ScoringEngines.jl i.e.

module ScoringEngines

using DataFrames
using Flux
using EvoTrees
using EvoTrees: predict

using ShapML
using Loess

using StatsBase: sample, quantile
using Statistics: mean, std

using StipplePlotly
using PlotlyBase

export logit
export one_way_data, one_way_plot, one_way_plot_weights

export get_shap_importance,
    get_shap_effect, 
    plot_shap_importance, 
    plot_shap_effect, 
    get_shap_explain,
    plot_shap_explain

const j_blue = "#4063D8"
const j_green = "#389826"
const j_purple = "#9558B2"
const j_red = "#CB3C33"

@info "Inside scoringengine/ScoringEngines.jl before all includes"

include("preproc-utils.jl")
include("preproc.jl")
include("model.jl")

include("inference.jl")
include("plots.jl")
include("explain.jl")

@info "Inside scoringengine/ScoringEngines.jl after all includes"

end

and setup.jl looks like this

include("ScoringEngines.jl")

@info "----- is scoring engine included", ScoringEngines.get_shap_explain

const j_blue = "#4063D8"
const j_green = "#389826"
const j_purple = "#9558B2"
const j_red = "#CB3C33"

@info "Directly to check where assets are: ", joinpath(@__DIR__, "../../assets")
const assets_path = joinpath(@__DIR__, "../../assets")

@info "after assets setup.jl"

df_tot = begin
    df_tot = ScoringEngines.load_data(joinpath(assets_path, "training_data.csv"))
    transform!(df_tot, "claim_amount" => ByRow(x -> x > 0 ? 1.0f0 : 0.0f0) => "event")
    dropmissing!(df_tot)
end

const preproc_flux = BSON.load(joinpath(assets_path, "preproc-flux.bson"), ScoringEngines)[:preproc]  # <-----This line causes the error
#....
#....
# further code

Which causes the error given below

Error Stack:

ERROR: LoadError: UndefVarError: ScoringEngineDemo not defined
in expression starting at /Users/abhi/.julia/geniebuilder/apps/ScoringEngineDemo/models/scoringengine/setup.jl:25
Stacktrace:
  [1] (::BSON.var"#31#32")(m::Module, f::String)
    @ BSON ~/.julia/packages/BSON/rOaki/src/extensions.jl:21
  [2] BottomRF
    @ ./reduce.jl:81 [inlined]
  [3] _foldl_impl(op::Base.BottomRF{BSON.var"#31#32"}, init::Module, itr::Vector{Any})
    @ Base ./reduce.jl:58
  [4] foldl_impl
    @ ./reduce.jl:48 [inlined]
  [5] mapfoldl_impl
    @ ./reduce.jl:44 [inlined]
  [6] _mapreduce_dim
    @ ./reducedim.jl:327 [inlined]
  [7] #mapreduce#731
    @ ./reducedim.jl:322 [inlined]
  [8] mapreduce
    @ ./reducedim.jl:322 [inlined]
  [9] #reduce#733
    @ ./reducedim.jl:371 [inlined]
 [10] reduce
    @ ./reducedim.jl:371 [inlined]
 [11] resolve(fs::Vector{Any}, init::Module)
    @ BSON ~/.julia/packages/BSON/rOaki/src/extensions.jl:21
 [12] (::BSON.var"#35#36")(d::Dict{Symbol, Any}, init::Module)
    @ BSON ~/.julia/packages/BSON/rOaki/src/extensions.jl:64
 [13] _raise_recursive(d::Dict{Symbol, Any}, cache::IdDict{Any, Any}, init::Module)
    @ BSON ~/.julia/packages/BSON/rOaki/src/read.jl:80
 [14] raise_recursive(d::Dict{Symbol, Any}, cache::IdDict{Any, Any}, init::Module)
    @ BSON ~/.julia/packages/BSON/rOaki/src/read.jl:93
 [15] (::BSON.var"#49#50")(d::Dict{Symbol, Any}, cache::IdDict{Any, Any}, init::Module)
    @ BSON ~/.julia/packages/BSON/rOaki/src/extensions.jl:167
 [16] raise_recursive(d::Dict{Symbol, Any}, cache::IdDict{Any, Any}, init::Module)
    @ BSON ~/.julia/packages/BSON/rOaki/src/read.jl:92
 [17] (::BSON.var"#19#22"{IdDict{Any, Any}, Module})(x::Dict{Symbol, Any})
    @ BSON ~/.julia/packages/BSON/rOaki/src/read.jl:86
 [18] applychildren!(f::BSON.var"#19#22"{IdDict{Any, Any}, Module}, x::Dict{Symbol, Any})
    @ BSON ~/.julia/packages/BSON/rOaki/src/BSON.jl:19
 [19] _raise_recursive(d::Dict{Symbol, Any}, cache::IdDict{Any, Any}, init::Module)
    @ BSON ~/.julia/packages/BSON/rOaki/src/read.jl:86
 [20] raise_recursive(d::Dict{Symbol, Any}, cache::IdDict{Any, Any}, init::Module)
    @ BSON ~/.julia/packages/BSON/rOaki/src/read.jl:93
 [21] raise_recursive
    @ ~/.julia/packages/BSON/rOaki/src/read.jl:103 [inlined]
 [22] load(x::String, init::Module)
    @ BSON ~/.julia/packages/BSON/rOaki/src/read.jl:108
 [23] top-level scope
    @ ~/.julia/geniebuilder/apps/ScoringEngineDemo/models/scoringengine/setup.jl:25
in expression starting at /Users/abhi/.julia/geniebuilder/apps/ScoringEngineDemo/routes.jl:33

Ready! 

AbhimanyuAryan avatar Jul 15 '22 09:07 AbhimanyuAryan

Please let me know if there's anything I can explain further and not clear

AbhimanyuAryan avatar Jul 15 '22 09:07 AbhimanyuAryan

Not sure why UndefVarError: ScoringEngineDemo not defined error is coming up. I am not using ScoringEngineDemo. Using ScoringEngines instead. Really confusing :(

AbhimanyuAryan avatar Jul 15 '22 09:07 AbhimanyuAryan

It looks like the issue is coming from from the loading of the BSON assets. Were these assets found at https://github.com/GenieFramework/ScoringEngineDemo/tree/master/assets directly copied from this repo, ScoringEngineDemo?

If such is the case, then I think the error is expected as these BSON assets contains structures defined in under the ScoringEngineDemo module. Therefore, in order for your reimplementation to work, I think it would require you to rebuild these assets by depending on ScoringEngines (https://github.com/GenieFramework/ScoringEngineDemo/blob/master/models/scoringengine/ScoringEngines.jl).

The development of these assets was performed in: https://github.com/JuliaComputing/ScoringEngineDemo.jl/tree/main/develop:

jeremiedb avatar Jul 15 '22 20:07 jeremiedb

yes they are directly copied to ScoringEngineDemo. Let me give the rebuilding step a try

AbhimanyuAryan avatar Jul 15 '22 20:07 AbhimanyuAryan

Hey @jeremiedb I retrained flux and gbt models and created all the relevant bson files with models/scoringengine/ScoringEngineExport.jl

For ex. the changes in preproc files and all three files look like:

include("../models/scoringengine/ScoringEngineExport.jl")

using DataFrames
using Statistics
using StatsBase: sample
using BSON
using CairoMakie
using Random

using Flux
using Flux: update!

global targetname = "event"

const assets_path = joinpath(@__DIR__, "..", "assets")
df_tot = ScoringEngineExport.load_data("assets/training_data.csv")

# minimal DF verbs
dfg = groupby(df_tot, "pol_coverage")
df = combine(dfg, [:vh_age, :vh_value] .=>  mean ∘ skipmissing .=> [:vh_age, :vh_value])
select(df, ["pol_coverage", "vh_value"])

# set target
transform!(df_tot, "claim_amount" => ByRow(x -> x > 0 ? 1.0f0 : 0.0f0) => "event")

norm_feats = ["vh_age", "vh_value", "vh_speed", "vh_weight", "drv_age1",
    "pol_no_claims_discount", "pol_coverage", "density", 
    "drv_exp_yrs", "pol_duration", "pol_sit_duration",
    "drv_sex1", "has_drv2", "is_drv2_male"]

# train/eval split
Random.seed!(123)
df_train, df_eval = ScoringEngineExport.data_splits(df_tot, 0.9)

density(collect(skipmissing(df_train.vh_age)))
density(collect(skipmissing(df_train.drv_age1)))

preproc = ScoringEngineExport.build_preproc(df_train, norm_feats = norm_feats)
adapter = ScoringEngineExport.build_adapter_flux(norm_feats, targetname)

df_train_pre = preproc(df_train)

density(collect(skipmissing(df_train_pre.vh_age)))
density(collect(skipmissing(df_train_pre.drv_age1)))

BSON.bson("assets/preproc-flux.bson", Dict(:preproc => preproc))
BSON.bson("assets/adapter-flux.bson", Dict(:adapter => adapter))

but I get ScoringEngineExport not defined error in model/ScoringEngine.jl file which is weird at line 39

const preproc_flux = BSON.load(joinpath(assets_path, "preproc-flux.bson"), ScoringEngineExport)[:preproc]

saying ScoringEngineExport is not defined. I have no idea what's wrong :(

ScoringEngineExport module is literally included at top of this file i.e. include("scoringengine/ScoringEngineExport.jl")

AbhimanyuAryan avatar Jul 19 '22 19:07 AbhimanyuAryan

Could you validate whether the ScoringEngineExport module was effectively properly made available in the model/ScoringEngine.jl file, for example just by trying to call some function defined in the module? I'm unclear if it's more of a BSON issue or could be tied to how Genie loads the environment?

I had issue reproducing with ScoringEngineApp due to the unregistered packages. I'll only be available in 3 days to look up further into this.

jeremiedb avatar Jul 20 '22 13:07 jeremiedb

thanks for the reply @jeremiedb I'll try to debug based on your feedback

AbhimanyuAryan avatar Jul 20 '22 13:07 AbhimanyuAryan

looks like the Module is available

@info "------------------" ScoringEngineExport.xyz()
const preproc_flux = BSON.load(joinpath(assets_path, "preproc-flux.bson"), ScoringEngineExport)[:preproc] # <----error caused here 

outputs:-

┌ Info: ------------------
└   ScoringEngineExport.xyz() = 1230
ERROR: UndefVarError: ScoringEngineExport not defined
Stacktrace: .....
......
....

errors at: https://github.com/GenieFramework/ScoringEngineApp/blob/13e84d7bfd7fec2990b603b0df52b93d80a8855b/models/ScoringEngine.jl#L41

AbhimanyuAryan avatar Jul 20 '22 15:07 AbhimanyuAryan

@jeremiedb I had an issue reproducing with [ScoringEngineApp](https://github.com/GenieFramework/ScoringEngineApp) due to the unregistered packages

Yes, I was trying to rebuild this demo for GenieBuilder: https://marketplace.visualstudio.com/items?itemName=GenieBuilder.geniebuilder

If it requires you to run the app to debug the error. I am afraid, the only way is to use Genie Builder.

I am linking the steps here if you want to run it in GB:

  1. You can download GenieBuilder for VS CODE

  2. Clone the app repo to .julia/geniebuilder/apps/ScoringEngineApp

  3. cd apps/ScoringEngineApp julia --project instantiate

  4. Then start genie builder from vscode. After the server start, it should load the app automatically to the workspace Screenshot 2022-07-25 at 1 57 07 PM

  5. Start the app Screenshot 2022-07-25 at 1 58 38 PM

  6. once the app is started you can see it in the browser

  7. run it in the browser Screenshot 2022-07-25 at 2 01 57 PM

  8. and should look like this Screenshot 2022-07-25 at 2 03 40 PM

AbhimanyuAryan avatar Jul 25 '22 08:07 AbhimanyuAryan

Is it possible that the issue has been resolved? I've been able to execute the code from master up to the first BSON loading: https://github.com/GenieFramework/ScoringEngineApp/blob/da271c8c70cafa6dc33654df311bf06ea9a91322/models/ScoringEngine.jl#L31

jeremiedb avatar Jul 27 '22 02:07 jeremiedb

No, the issue still persists. The BSON models used on the master are provided by you. Also, have you noticed

const preproc_flux = BSON.load(joinpath(assets_path, "preproc-flux.bson"), ScoringEngineDemo)[:preproc]

I am still using ScoringEngineDemo as the name. I would name it EngineUtils or something else and then would want to use

const preproc_flux = BSON.load(joinpath(assets_path, "preproc-flux.bson"), EngineUtils)[:preproc]

instead of ScoringEngineDemo

But since the app it's using ScoringEngineDemo you should be able to run the app without any issues. See the visualisation. Do you face issues at L31?

AbhimanyuAryan avatar Jul 27 '22 06:07 AbhimanyuAryan

Thanks for the clarification regarding the assets/BSON files being used. I didn't take the time to actually setup with Genie 5, so what I did was:

  • Clone ScoringEnginepp
  • Remove the GenieSession / GessionSessionFileSession dependencies (in order to allow instantiation)
  • Instantiate the project
  • Execute the following code block: https://github.com/GenieFramework/ScoringEngineApp/blob/da271c8c70cafa6dc33654df311bf06ea9a91322/models/ScoringEngine.jl#L3-L38

The above did work fine. Would it work for you as well? Is you change the name to EngineUtils, then it would involve that you rebuild the full assets from scratch, using scripts similar to https://github.com/JuliaComputing/ScoringEngineDemo.jl/tree/main/develop previouly referred to. It seems like the issue you had would come from those BSON not being properly recreating under the new name EngineUtils. Only changing the module name in https://github.com/GenieFramework/ScoringEngineApp/blob/wip/models/scoringengine/ScoringEngineExport.jl for example isn't enough as the BSON assets contains structures which expect the module name to match.

jeremiedb avatar Jul 27 '22 06:07 jeremiedb

@jeremiedb yes I understand the BSON needs to be recreated. Which involves running all preprocessing and training steps in main/develop

so what I did was

change this module and file name: https://github.com/JuliaComputing/ScoringEngineDemo.jl/blob/main/src/ScoringEngineDemo.jl to EngineUtils and EngineUtils.jl respectively

and change all the occurrences of ScoringEngineDemo to EngineUtils in preproc-flux.jl preproc-gbtree.jl hyper-gbtree.jl hyper-flux.jl and re-created necessary BSON assets

Then I used those assets in my GenieBuilder project and changed lines to

const preproc_flux = BSON.load(joinpath(assets_path, "preproc-flux.bson"), EngineUtils)[:preproc]

but that crashed saying ERROR: UndefVarError: EngineUtils not defined

AbhimanyuAryan avatar Jul 27 '22 07:07 AbhimanyuAryan

Exactly my problem, mutatis mutandis module names. I've tried switching from BSON to Julia's native serialisation but the problem persists. I've tried rebuilding all the preprocessing and training steps in various module contexts and even in one big file inside Genie's \models

ctbaum avatar Sep 27 '22 10:09 ctbaum

Exactly my problem, mutatis mutandis module names. I've tried switching from BSON to Julia's native serialisation but the problem persists. I've tried rebuilding all the preprocessing and training steps in various module contexts and even in one big file inside Genie's \models

Could you confirm if ScoringEngineDemo works fine on your end? BTW, if using Julia 1.8, I'd suggest considering this branch: https://github.com/jeremiedb/ScoringEngineDemo.jl/tree/julia-180, as it also uses Genie v5 (and Genie v4 isn't compatible with Julia 1.8).

Also, would you have further details about how your project is different than ScoringEngineDemo? Is it just a clone along a renaming of the pkg name? Something to be careful about is that there are structs and functions defined under src/ that are used during the assets creation steps found in the develop/ scripts, which might explain some issues encountered.

jeremiedb avatar Sep 27 '22 15:09 jeremiedb

Hi, yes I see under src/ that for the purposes of preprocessing the most important files are preproc.jl and preproc-utils.jl both of which I've adapted from you for my needs here https://github.com/ctrebbau/TimeApp

I basically followed your lead from preprocessing and training which up to that point works fine, and then I followed abhi's lead in embedding the development within a Genie app, at which point neither BSON nor Serialiser are able to reconstruct the types of the encoded assets with the Genie Builder.

Abhi and I have tagged you on Genie's discord channel if you wish to see our discussion. Thank you so much in advance, I'm learning so much from your way of developing things :)

ctbaum avatar Sep 27 '22 17:09 ctbaum

I think (🤞) I finally got it working by adding the module's path onto LOAD_PATH so as to use using where appropriate. But I've been burned before on this, will keep posting...

ctbaum avatar Sep 28 '22 14:09 ctbaum

Hi @ctrebbau, did you have some success on your end? From the clone I made, there are a couple things I changed, notably using the include instead of the LAD_PATH as well as using .Utils rather than using Utils:

include(joinpath(@__DIR__, "dev", "Utils.jl"))
using .Utils

What seems to be an issue though is that loading from BSON results in errors similar to the following:

julia> TimeApp.TimePreds.adapter_flux
┌ Error: 2022-09-30 16:31:08 Failed to revise C:\Users\jerem\OneDrive\github\TimeApp\models\TimePreds.jl
│   exception =
│    setfield!: const field .names of type TypeName cannot be changed

That seems to be an ongoing concern with BSON, as such I'd recommend using JLD2 which works fairly well (although there are some caveats such as in issues with anonynmous functions which aren't supported).

I might be too optimistic, but I think that JLD2 might be the remaining bottleneck from this PR: https://github.com/ctrebbau/TimeApp/pull/1

Also, given sensitivity of saved model assets to current version of their related packages, it might be advised not to ignore the Manifest.toml.

jeremiedb avatar Sep 30 '22 20:09 jeremiedb

Hi @jeremiedb, the issue with loading assets seems to be resolved on my side of things. I was wondering however if you could point me to some resource(s) to more deeply understand your end-to-end way of developing reproducible ML. I've felt like I've been plodding along trying to be consistently reproducible before I stumbled on your conference. Thanks in advance.

ctbaum avatar Oct 06 '22 14:10 ctbaum

Hi @ctrebbau, I unfortunately couldn't refer to any specific reference as this kind on demo was built on the result of years building models with varying levels of productionization expectations. I think the MLJ ecosystem has a good mindset about it. Otherwise, I think my main learning resolved aroud how to reason around data processing. I think it's fairly natural to enforce model reproducibility, yet the processing part has been too often as seperate endeavour (at least for me!). By thinking of all these data munging step as just another kind of model, which performs a reprojection of the features though filtering, missing imputations, normalizations, etc., I think it helps building a workflow that provides more zen about its reliability.
Bulking everything from data processing, models and dashboard as in this demo isn't likely the most convenient. But this is where Julia's nice dependency management makes it smooth to maintain components in seperates packages, yet keeping strong versioning control through approapriate Manifest or release/pkg version use. So I think it comes down to proper planning, adding some extra effort for developing reproducible pipelines in a deployment mindset, and taking advantage of tools such as Julia's Pkg system. Hope this might help, although it's mostly empirical!

jeremiedb avatar Oct 07 '22 21:10 jeremiedb

Sounds like there's a book in the pipelines? 🤞

ctbaum avatar Oct 08 '22 12:10 ctbaum