Pore-C-Snakemake icon indicating copy to clipboard operation
Pore-C-Snakemake copied to clipboard

Pandas error: Exception("name 'direct_long_range' is not defined")

Open vmurigneu opened this issue 2 years ago • 0 comments

Hello,

We are trying to run the pore-c pipeline on two subsamples of a dataset. The pipeline previously ran successfully on the full dataset (using a different HPC that we can not access anymore). We encountered the following error while running a first subsample:

Error in rule summarise_contacts:
    jobid: 8
    output: /scratch/project/gihex20hol/POR/sub10X/results_sub10X/merged_contacts/NlaIII_run01_BrahChr_chr1_2_shred_20kb_unphased.concatemers.parquet, /scratch/project/gihex20hol/POR/sub10X/results_sub10X/merged_contacts/NlaIII_run01_BrahChr_chr1_2_shred_20kb_unphased.concatemer_summary.csv
    log: /scratch/project/gihex20hol/POR/sub10X/results_sub10X/merged_contacts/NlaIII_run01_BrahChr_chr1_2_shred_20kb_unphased.concatemers.parquet.log (check log file(s) for error message)
    conda-env: /scratch/project_mnt/S0024/test/Pore-C-Snakemake/.snakemake/conda/14b8a690
    shell:
        pore_c --dask-scheduler-port 0 --dask-num-workers 10 contacts summarize /scratch/project/POR/sub10X/results_sub10X/merged_contacts/NlaIII_run01_BrahChr_chr1_2_shred_20kb_unphased.contacts.parquet /scratch/project/POR/sub10X/results_sub10X/basecall/NlaIII_run01.rd.summary.csv /scratch/project/POR/sub10X/results_sub10X/merged_contacts/NlaIII_run01_BrahChr_chr1_2_shred_20kb_unphased.concatemers.parquet /scratch/project/POR/sub10X/results_sub10X/merged_contacts/NlaIII_run01_BrahChr_chr1_2_shred_20kb_unphased.concatemer_summary.csv --user-metadata '{"run_id":"run01","enzyme":"NlaIII","biospecimen":"BrahMom","refgenome_id":"BrahChr_chr1_2_shred_20kb","phase_set_id":"unphased"}' 2>/scratch/project/POR/sub10X/results_sub10X/merged_contacts/NlaIII_run01_BrahChr_chr1_2_shred_20kb_unphased.concatemers.parquet.log
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

The log file contains :

    res = self.env.resolve(self.local_name, is_local=self.is_local)
  File "/scratch/project_mnt/S0024/test/Pore-C-Snakemake/.snakemake/conda/14b8a690/lib/python3.8/site-packages/pandas/core/computation/scope.py", line 203, in resolve
    raise UndefinedVariableError(key, is_local)
Exception: name 'direct_long_range' is not defined

This is the content of the results folder:

drwxr-sr-x. 2 uqvmurig Q1654RW   4096 Dec 11 13:29 refgenome
drwxr-sr-x. 2 uqvmurig Q1654RW  32768 Dec 11 14:17 basecall
drwxr-sr-x. 2 uqvmurig Q1654RW 524288 Dec 12 17:46 mapping
drwxr-sr-x. 2 uqvmurig Q1654RW   4096 Dec 12 17:47 virtual_digest
drwxr-sr-x. 2 uqvmurig Q1654RW 262144 Dec 16 10:54 align_table
drwxr-sr-x. 2 uqvmurig Q1654RW 131072 Dec 16 10:55 contacts
drwxr-sr-x. 3 uqvmurig Q1654RW   4096 Dec 19 09:28 merged_contacts

A similar error with another subsample of the dataset:

/scratch/project_mnt/S0024/test/Pore-C-Snakemake/.snakemake/conda/1614131d/lib/python3.8/site-packages/pandas/core/arrays/categorical.py:2747: FutureWarning: The `inplace` parameter in pandas.Categorical.set_categories is deprecated and will be removed in a future version. Removing unused categories will always return a new Categorical object.
  res = method(*args, **kwargs)
Traceback (most recent call last):
  File "/scratch/project_mnt/S0024/test/Pore-C-Snakemake/.snakemake/conda/1614131d/lib/python3.8/site-packages/pandas/core/computation/scope.py", line 208, in resolve
    return self.temps[key]
KeyError: 'direct_long_range'

  File "/scratch/project_mnt/S0024/test/Pore-C-Snakemake/.snakemake/conda/1614131d/lib/python3.8/site-packages/pandas/core/computation/ops.py", line 115, in _resolve_name
    res = self.env.resolve(self.local_name, is_local=self.is_local)
  File "/scratch/project_mnt/S0024/test/Pore-C-Snakemake/.snakemake/conda/1614131d/lib/python3.8/site-packages/pandas/core/computation/scope.py", line 213, in resolve
    raise UndefinedVariableError(key, is_local) from err
pandas.core.computation.ops.UndefinedVariableError: name "name 'direct_long_range' is not defined" is not defined

Thank you for your help Valentine

vmurigneu avatar Dec 26 '22 22:12 vmurigneu