gimmemotifs
gimmemotifs copied to clipboard
Maelstrom: uninformative error when duplicate regions present
Hi there,
I ran into the following error when I tried running maelstrom with a file containing identical regions (with different labels). It took quite a while to track down this error, so a more helpful error message would be good. Or you could put explicit instructions not to use duplicate entries on the tutorial site
2024-06-16 13:39:27,642 - INFO - Starting maelstrom 2024-06-16 13:39:28,107 - INFO - motif scanning (counts) 2024-06-16 13:39:28,108 - INFO - reading table 2024-06-16 13:39:37,351 - INFO - using 14000 sequences 2024-06-16 13:39:37,369 - INFO - Creating index for genomic GC frequencies. 2024-06-16 13:43:26,698 - INFO - setting threshold 2024-06-16 13:45:09,928 - INFO - creating count table 2024-06-16 14:00:08,540 - INFO - done 2024-06-16 14:00:08,547 - INFO - creating dataframe Traceback (most recent call last): File "/storage/home/sam77/work/software/miniconda3/envs/gimme/bin/gimme", line 12, in <module> cli(sys.argv[1:]) File "/storage/home/sam77/work/software/miniconda3/envs/gimme/lib/python3.10/site-packages/gimmemotifs/cli.py", line 755, in cli args.func(args) File "/storage/home/sam77/work/software/miniconda3/envs/gimme/lib/python3.10/site-packages/gimmemotifs/commands/maelstrom.py", line 42, in maelstrom run_maelstrom( File "/storage/home/sam77/work/software/miniconda3/envs/gimme/lib/python3.10/site-packages/gimmemotifs/maelstrom/__init__.py", line 192, in run_maelstrom counts = scan_regionfile_to_table( File "/storage/home/sam77/work/software/miniconda3/envs/gimme/lib/python3.10/site-packages/gimmemotifs/scanner/__init__.py", line 180, in scan_regionfile_to_table df = pd.DataFrame(scores, index=idx, columns=motif_names, dtype=dtype) File "/storage/home/sam77/work/software/miniconda3/envs/gimme/lib/python3.10/site-packages/pandas/core/frame.py", line 754, in __init__ mgr = arrays_to_mgr( File "/storage/home/sam77/work/software/miniconda3/envs/gimme/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 123, in arrays_to_mgr arrays = _homogenize(arrays, index, dtype) File "/storage/home/sam77/work/software/miniconda3/envs/gimme/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 620, in _homogenize com.require_length_match(val, index) File "/storage/home/sam77/work/software/miniconda3/envs/gimme/lib/python3.10/site-packages/pandas/core/common.py", line 571, in require_length_match raise ValueError( ValueError: Length of values (235020) does not match length of index (235653)
Installation information :
- OS: Linux
- Installation conda
- Version [e.g. 0.18.0]