Baysor icon indicating copy to clipboard operation
Baysor copied to clipboard

Error estimating boundary polygons

Open timwiggin opened this issue 2 years ago • 5 comments

I've gotten the same crash at the same point in processing twice which prevented the export of a GeoJSON file or cell shape plot. The run command and log output below:

Run parameters: ./Baysor/bin/baysor run -g gene -x global_x -y global_y -z global_z --plot --save-polygons GeoJSON -o ./baysor_output/[project name] ./data/[input file] :cell

Log output:

[15:15:00] Info: (2021-10-28) Run Baysor v0.5.0
[15:15:00] Info: Loading data...
[15:15:10] Info: Loaded 5347668 transcripts
[15:15:12] Info: Estimating noise level
[15:15:19] Warning: Only k-nn random field is supported for 3D data
└ Baysor /home/vpetukhov/.julia/dev/Baysor/src/data_processing/triangulation.jl:30
[15:16:57] Info: Done
[15:16:57] Warning: Only k-nn random field is supported for 3D data
└ Baysor /home/vpetukhov/.julia/dev/Baysor/src/data_processing/triangulation.jl:30
[15:18:09] Info: Clustering molecules...
[15:18:09] Warning: Only k-nn random field is supported for 3D data
└ Baysor /home/vpetukhov/.julia/dev/Baysor/src/data_processing/triangulation.jl:30
[15:25:13] Info: Algorithm stopped after 263 iterations. Error: 0.00955. Converged: true.
[15:25:15] Info: Done
[15:25:16] Info: Initializing algorithm. Scale: 3.2142932583089294, scale std: 1.0218244756990273, initial #components: 3565112, #molecules: 5347668.
[15:27:19] Info: Using 3D coordinates
[20:40:46] Info: Processing complete.
[20:42:40] Info: Estimating local colors
[20:48:34] Info: Saving results to ./baysor_output/[project name]
[20:49:12] Info: Plot diagnostics
[20:49:50] Info: Estimating boundary polygons
[20:55:50] Error: InexactError(:trunc, UInt16, 65540)
|                 
|                 throw_inexacterror(f::Symbol, #unused#::Type{UInt16}, val::Int64) at boot.jl:602
|                 checked_trunc_uint at boot.jl:632 [inlined]
|                 toUInt16 at boot.jl:705 [inlined]
|                 UInt16 at boot.jl:755 [inlined]
|                 convert at number.jl:7 [inlined]
|                 setindex! at array.jl:841 [inlined]
|                 find_grid_point_labels_kde(pos_data::Matrix{Float64}, cell_labels::Vector{Int64}, min_x::Vector{Float64}, max_x::Vector{Float64}; grid_step::Float64, bandwidth::Float64, dens_threshold::Float64, min_molecules_per_cell::Int64, verbose::Bool) at boundary_estimation.jl:208
|                 (::Baysor.var"#find_grid_point_labels_kde##kw")(::NamedTuple{(:grid_step, :bandwidth), Tuple{Float64, Float64}}, ::typeof(Baysor.find_grid_point_labels_kde), pos_data::Matrix{Float64}, cell_labels::Vector{Int64}, min_x::Vector{Float64}, max_x::Vector{Float64}) at boundary_estimation.jl:159
|                 boundary_polygons(pos_data::Matrix{Float64}, cell_labels::Vector{Int64}; min_x::Nothing, max_x::Nothing, grid_step::Float64, min_border_length::Int64, shape_method::Symbol, max_dev::Float64, bandwidth::Float64, exclude_labels::Vector{Int64}, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}) at boundary_estimation.jl:263
|                 boundary_polygons at boundary_estimation.jl:254 [inlined]
|                 #boundary_polygons#217 at boundary_estimation.jl:248 [inlined]
|                 (::Baysor.var"#boundary_polygons##kw")(::NamedTuple{(:grid_step, :bandwidth), Tuple{Float64, Float64}}, ::typeof(Baysor.boundary_polygons), spatial_df::DataFrames.DataFrame, args::Vector{Int64}) at boundary_estimation.jl:248
|                 (::Baysor.var"#333#337"{DataFrames.DataFrame, Vector{Int64}, Dict{String, Any}, Float64})(mask::BitVector) at main.jl:215
|                 #47 at ProgressMeter.jl:991 [inlined]
|                 iterate at generator.jl:47 [inlined]
|                 _collect(c::Vector{BitVector}, itr::Base.Generator{Vector{BitVector}, ProgressMeter.var"#47#50"{Distributed.RemoteChannel{Channel{Bool}}, Baysor.var"#333#337"{DataFrames.DataFrame, Vector{Int64}, Dict{String, Any}, Float64}}}, #unused#::Base.EltypeUnknown, isz::Base.HasShape{1}) at array.jl:691
|                 collect_similar(cont::Vector{BitVector}, itr::Base.Generator{Vector{BitVector}, ProgressMeter.var"#47#50"{Distributed.RemoteChannel{Channel{Bool}}, Baysor.var"#333#337"{DataFrames.DataFrame, Vector{Int64}, Dict{String, Any}, Float64}}}) at array.jl:606
|                 map(f::Function, A::Vector{BitVector}) at abstractarray.jl:2294
|                 macro expansion at ProgressMeter.jl:990 [inlined]
|                 macro expansion at task.jl:382 [inlined]
|                 macro expansion at ProgressMeter.jl:989 [inlined]
|                 macro expansion at task.jl:382 [inlined]
|                 progress_map(::Function, ::Vararg{Any, N} where N; mapfun::Function, progress::ProgressMeter.Progress, channel_bufflen::Int64, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}) at ProgressMeter.jl:982
|                 progress_map(::Function, ::Vararg{Any, N} where N) at ProgressMeter.jl:978
|                 plot_transcript_assignment_panel(df_res::DataFrames.DataFrame, assignment::Vector{Int64}, args::Dict{String, Any}; clusters::Vector{Int64}, prior_polygons::Vector{Matrix{Float64}}, gene_colors::Vector{ColorTypes.Lab{Float64}}) at main.jl:215
|                 (::Baysor.var"#plot_transcript_assignment_panel##kw")(::NamedTuple{(:clusters, :prior_polygons, :gene_colors), Tuple{Vector{Int64}, Vector{Matrix{Float64}}, Vector{ColorTypes.Lab{Float64}}}}, ::typeof(Baysor.plot_transcript_assignment_panel), df_res::DataFrames.DataFrame, assignment::Vector{Int64}, args::Dict{String, Any}) at main.jl:200
|                 save_segmentation_results(bm_data::Baysor.BmmData{3}, gene_names::Vector{String}, args::Dict{String, Any}; mol_clusts::NamedTuple{(:exprs, :assignment, :diffs, :assignment_probs, :change_fracs), Tuple{Matrix{Float64}, Vector{Int64}, Vector{Float64}, Matrix{Float64}, Vector{Float64}}}, comp_segs::Nothing, prior_polygons::Vector{Matrix{Float64}}) at main.jl:406
|                 (::Baysor.var"#save_segmentation_results##kw")(::NamedTuple{(:mol_clusts, :comp_segs, :prior_polygons), Tuple{NamedTuple{(:exprs, :assignment, :diffs, :assignment_probs, :change_fracs), Tuple{Matrix{Float64}, Vector{Int64}, Vector{Float64}, Matrix{Float64}, Vector{Float64}}}, Nothing, Vector{Matrix{Float64}}}}, ::typeof(Baysor.save_segmentation_results), bm_data::Baysor.BmmData{3}, gene_names::Vector{String}, args::Dict{String, Any}) at main.jl:389
|                 run_cli_main(args::Vector{String}) at main.jl:470
|                 run_cli(args::Vector{String}) at common.jl:118
|                 run_cli at common.jl:103 [inlined]
|                 julia_main at common.jl:139 [inlined]
|                 julia_main() at none:36
└ Baysor /home/vpetukhov/.julia/dev/Baysor/src/cli/common.jl:132

Environment is Ubuntu 20.04.2, Baysor was running in a screen 4.08.00 session.

timwiggin avatar Oct 29 '21 14:10 timwiggin

I need to check it more, but the first guess is that you're using name :cell for prior segmentation, but it is also used for the output of Baysor, and there can be some problems with type conversions. I should fix it anyway at some point, but could you please try to rename the prior segmentation column and rerun?

VPetukhov avatar Oct 29 '21 14:10 VPetukhov

I'll try that and update when the run is complete.

timwiggin avatar Oct 29 '21 14:10 timwiggin

Just finished running again with the column renamed prior_cell, unfortunately got the same error.

timwiggin avatar Oct 29 '21 21:10 timwiggin

I am getting the same error if I am running with -m 50 but not if I run it with -m 3 . Is there any solution to that?

maximilian-heeg avatar Jun 22 '22 16:06 maximilian-heeg

After doing some digging, I think I found the error. The function find_grid_point_labels_kde (in src/data_processing/boundary_estimation.jl) can allocate the wrong type for label_mat.

In find_grid_point_labels_kde a binary a mask deciding if the cell meets the min_molecules_per_cell criterion is created and saved in filt_mask. Further cell_labels is defined as (0:maximum(cell_labels)). The filt_mask is used to subset both coords_per_label and cell_labels.

If you have more than 2^16 identified cells, the maximal value of cell_labels has to be of type UInt32. However, if a lot of the cells do not meet the min_molecules_per_cell criterion, the length of both arrays (coords_per_label and cell_labels) can be lower than 2^16 after applying the filt_mask. In Line 172-176 label_mat is created with type UInt16 if the length of coords_per_label is smaller than (2^16) - 1. However, the final value that is stored in label_mat derives from the values in cell_labels and can still be of type UInt32. If this is the case, the assignment in line 208 causes the error.

The fix is easy: Line 172 has to be changed from

if length(coords_per_label) < (2 ^ 16) - 1

to

if maximum(cell_labels) < (2 ^ 16) - 1

This assures that label_mat is always of the correct type. After changing that line, the program ran as expected for me.

maximilian-heeg avatar Jun 24 '22 03:06 maximilian-heeg