JuliaDB.jl icon indicating copy to clipboard operation
JuliaDB.jl copied to clipboard

Cannot seem to load a very small file

Open xiaodaigh opened this issue 6 years ago • 3 comments
trafficstars

Included a runnable MWE. The file is less than 1mb but just seems to hang in the terminal in Julia 1.2.0 Windows 10, but is working fine on Julia 1.1.1

using JuliaDB, Dagger

##############################################################
# Download & Extract data
###############################################################

#;wget https://raw.githubusercontent.com/xiaodaigh/JuliaDB.jl/master/ok.csv

##############################################################
# Specify the types of columns
###############################################################

fmtypes = [
    Int64,                     String,     Union{String, Missing},     Union{Float64, Missing},    Union{Float64, Missing},
    Union{Float64, Missing},    Union{Float64, Missing},    Union{Float64, Missing},    Union{String, Missing},     Union{String, Missing},
    Union{String, Missing},     Union{String, Missing},     Union{String, Missing},     Union{String, Missing},     Union{String, Missing},
    Union{String, Missing},     Union{String, Missing},     Union{Float64, Missing},    Union{Float64, Missing},    Union{Float64, Missing},
    Union{Float64, Missing},    Union{Float64, Missing},    Union{Float64, Missing},    Union{Float64, Missing},    Union{Float64, Missing},
    Union{Float64, Missing},    Union{Float64, Missing},    Union{Float64, Missing},    Union{String, Missing},     Union{Float64, Missing},
    Union{String, Missing}]


@time jll = loadtable(
    "ok.csv",
    output = "fm.jldb/",
    delim=',',
    header_exists=true,
    #filenamecol = "filename",
    #chunks = length(ifiles),
    #type_detect_rows = 20_000,
    # colnames = colnames,
    colparsers = fmtypes,
    indexcols=["Column1"]);

xiaodaigh avatar Aug 31 '19 11:08 xiaodaigh

I can reproduce this, and confirm that with a non-release Julia v1.3 build on Linux it hangs and ignores attempts to Ctrl-C.

jpsamaroo avatar Dec 12 '19 03:12 jpsamaroo

Could you try to read it with just TextParse.jl? Just to figure out whether the problem is there, or in JuliaDB.

davidanthoff avatar Dec 12 '19 04:12 davidanthoff

Doing just TextParse.csvread("ok.csv", ','; header_exists=true, colparsers=fmtypes) loads the file successfully in ~5 seconds (including inference and compilation time, which is quite good). So clearly this is a JuliaDB issue. Thanks for the tip @davidanthoff !

jpsamaroo avatar Dec 12 '19 23:12 jpsamaroo