SubPhaser icon indicating copy to clipboard operation
SubPhaser copied to clipboard

cannot allocate memory

Open dabitz opened this issue 5 months ago • 13 comments

Hi,

Thanks a lot for the very nice tool!

I am trying to phase the subgenomes from this hexaploid haplotype-phased genome (9Gb), but somehow I always get stuck with the error message cannot allocate memory, despite changing the memory option several times... Any help with that is appreciated.

Cheers André ... 24-01-25 07:23:35 [INFO] Loading kmer matrix from jellyfish 24-01-25 07:23:35 [INFO] Start Pool with 40 process(es) 24-01-25 07:23:57 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_53.fasta_15.fa 24-01-25 07:28:54 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_60.fasta_15.fa 24-01-25 07:29:21 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_5.fasta_15.fa 24-01-25 07:30:13 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_57.fasta_15.fa 24-01-25 07:30:47 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_61.fasta_15.fa 24-01-25 07:30:52 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_54.fasta_15.fa 24-01-25 07:31:00 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_22.fasta_15.fa 24-01-25 07:31:36 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_50.fasta_15.fa 24-01-25 07:31:46 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_52.fasta_15.fa 24-01-25 07:32:25 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_48.fasta_15.fa 24-01-25 07:32:31 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_42.fasta_15.fa 24-01-25 07:32:38 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_47.fasta_15.fa 24-01-25 07:32:44 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_55.fasta_15.fa 24-01-25 07:32:49 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_4.fasta_15.fa 24-01-25 07:33:38 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_35.fasta_15.fa 24-01-25 07:33:47 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_40.fasta_15.fa 24-01-25 07:33:53 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_25.fasta_15.fa 24-01-25 07:34:02 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_27.fasta_15.fa 24-01-25 07:34:12 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_38.fasta_15.fa 24-01-25 07:34:22 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_37.fasta_15.fa 24-01-25 07:35:11 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_41.fasta_15.fa 24-01-25 07:35:17 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_26.fasta_15.fa 24-01-25 07:35:28 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_33.fasta_15.fa 24-01-25 07:35:40 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_65.fasta_15.fa 24-01-25 07:35:52 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_28.fasta_15.fa 24-01-25 07:36:01 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_7.fasta_15.fa 24-01-25 07:36:12 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_17.fasta_15.fa 24-01-25 07:36:21 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_36.fasta_15.fa 24-01-25 07:36:32 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_30.fasta_15.fa 24-01-25 07:36:44 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_14.fasta_15.fa 24-01-25 07:37:41 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_18.fasta_15.fa 24-01-25 07:37:57 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_63.fasta_15.fa 24-01-25 07:38:08 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_1.fasta_15.fa 24-01-25 07:38:19 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_16.fasta_15.fa 24-01-25 07:38:27 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_31.fasta_15.fa 24-01-25 07:38:36 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_12.fasta_15.fa 24-01-25 07:38:49 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_11.fasta_15.fa 24-01-25 07:39:01 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_62.fasta_15.fa 24-01-25 07:39:07 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_23.fasta_15.fa 24-01-25 07:39:18 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_64.fasta_15.fa 24-01-25 07:39:23 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_66.fasta_15.fa 24-01-25 07:39:37 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_39.fasta_15.fa 24-01-25 07:39:55 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_15.fasta_15.fa 24-01-25 07:40:08 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_3.fasta_15.fa 24-01-25 07:40:19 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_21.fasta_15.fa 24-01-25 07:40:29 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_24.fasta_15.fa 24-01-25 07:41:21 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_29.fasta_15.fa 24-01-25 07:41:31 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_34.fasta_15.fa 24-01-25 07:41:40 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_32.fasta_15.fa 24-01-25 07:41:52 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_56.fasta_15.fa 24-01-25 07:42:08 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_8.fasta_15.fa 24-01-25 07:42:20 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_9.fasta_15.fa 24-01-25 07:42:32 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_10.fasta_15.fa 24-01-25 07:42:43 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_13.fasta_15.fa 24-01-25 07:42:55 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_2.fasta_15.fa 24-01-25 07:43:08 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_19.fasta_15.fa 24-01-25 07:43:20 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_20.fasta_15.fa 24-01-25 07:43:30 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_45.fasta_15.fa 24-01-25 07:43:38 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_46.fasta_15.fa 24-01-25 07:43:44 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_6.fasta_15.fa 24-01-25 07:43:51 [INFO] 62557073 kmers in total 24-01-25 07:43:51 [INFO] Filtering differential kmers Traceback (most recent call last): File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/bin/subphaser", line 33, in sys.exit(load_entry_point('subphaser==1.2.6', 'console_scripts', 'subphaser')()) File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/main.py", line 797, in main pipeline.run() File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/main.py", line 422, in run d_mat = dumps.filter(d_mat, lengths, self.sgs, outfig=histfig, #d_targets=d_targets, File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/Jellyfish.py", line 487, in filter for kmer, freqs, tot_freq in pool_func(_filter_kmer, args, self.ncpu, File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/RunCmdsMP.py", line 336, in pool_func pool = multiprocessing.Pool(processors) File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/multiprocessing/context.py", line 119, in Pool return Pool(processes, initializer, initargs, maxtasksperchild, File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/multiprocessing/pool.py", line 212, in init self._repopulate_pool() File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/multiprocessing/pool.py", line 303, in _repopulate_pool return self._repopulate_pool_static(self._ctx, self.Process, File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/multiprocessing/pool.py", line 326, in _repopulate_pool_static w.start() File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/multiprocessing/context.py", line 277, in _Popen return Popen(process_obj) File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init self._launch(process_obj) File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/multiprocessing/popen_fork.py", line 70, in _launch self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory

dabitz avatar Jan 25 '24 06:01 dabitz

How much is the RAM of your computer?

zhangrengang avatar Jan 25 '24 08:01 zhangrengang

That's the thing. I am running from a cluster with 500G RAM and 64 threads

dabitz avatar Jan 25 '24 08:01 dabitz

How about the peak memory? Surely the large genome require large memory, but I can run the wheat genome (14Gb, 140M kmers, 21 chromosomes) with 1Tb RAM. If it actually exceed the 500G RAM, you may try to increase -lower_count to reduce kmers, or reduce the chromosomes in the config file. If necessary, you may try to decrease the chunksize in /netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/main.py from:

        # matrix
        logger.info('Loading kmer matrix from jellyfish')   # multiprocessing by kmer
        chunksize = None if self.pool_method == 'map' else 20000

to

        # matrix
        logger.info('Loading kmer matrix from jellyfish')   # multiprocessing by kmer
        chunksize = None if self.pool_method == 'map' else 200

By the way, if your hexaploid is an autohexaploid, there's no reason to waste time to try subphaser.

zhangrengang avatar Jan 25 '24 09:01 zhangrengang

Thanks! I will try on our HPC cluster with 1TB or adjust the parameters as you suggested. How long should the whole run take? I am not sure is a autopolyploid, there is some evidence for a hybdrid between autotetra with a diploid.

dabitz avatar Jan 25 '24 09:01 dabitz

In general 1-2 days is needed for the large genome.

zhangrengang avatar Jan 25 '24 09:01 zhangrengang

somehow is strange... running on our HPC node with 1TB the job exits with> Resource usage summary:

CPU time   :   2301.09 sec.
Max Memory :     69212 MB
Max Swap   :    754180 MB

Max Processes  :        44
Max Threads    :        48

dabitz avatar Jan 26 '24 10:01 dabitz

Are you using SLURM which limits Memory according to Processes?

zhangrengang avatar Jan 26 '24 13:01 zhangrengang

We use LSF, but I set the memory limit to 980G, and still exits. But it seems that the max memory set was not even reached before it exits.

dabitz avatar Jan 26 '24 14:01 dabitz

It is strange. You may try to reduce the -cpu set to 1 to see the memory cost.

zhangrengang avatar Jan 27 '24 02:01 zhangrengang

it did advance a bit, but still failed.

24-02-04 08:21:52 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_19.fasta_15.fa 24-02-04 08:22:07 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_20.fasta_15.fa 24-02-04 08:22:19 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_45.fasta_15.fa 24-02-04 08:22:31 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_46.fasta_15.fa 24-02-04 08:22:41 [INFO] Loading /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_chromosomes/scaffold_6.fasta_15.fa 24-02-04 08:22:48 [INFO] 62557073 kmers in total 24-02-04 08:22:48 [INFO] Filtering differential kmers 24-02-04 08:22:48 [INFO] Start Pool with 1 process(es) 24-02-04 08:28:46 [INFO] Processed 10000000 kmers 24-02-04 08:34:51 [INFO] Processed 20000000 kmers 24-02-04 08:40:59 [INFO] Processed 30000000 kmers 24-02-04 08:47:03 [INFO] Processed 40000000 kmers 24-02-04 08:52:35 [INFO] Processed 50000000 kmers 24-02-04 08:58:40 [INFO] Processed 60000000 kmers 24-02-04 09:00:08 [INFO] After filtering, remained 4 (0.00%) differential (freq >= 200) and 56 (0.00%) candidate (freq > 0) kmers 24-02-04 09:00:08 [INFO] Plot /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_phase-results/CBC_k15_q200_f2.kmer_freq.pdf 24-02-04 09:00:44 [INFO] New check point file: /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_tmp/CBC_CBC_k15_q200_f2.kmer.mat.ok 24-02-04 09:00:44 [INFO] ###Step: Cluster 24-02-04 09:00:44 [INFO] Performing bootstrap of 1000 replicates, with each replicate resampling 50% data with replacement 24-02-04 09:01:29 [INFO] Bootstrap: mean Adjusted Rand-Index: 0.9635; mean V-measure score: 0.9538 24-02-04 09:01:29 [INFO] Subgenome assignments: OrderedDict([('scaffold_1', 'SG1'), ('scaffold_4', 'SG2'), ('scaffold_16', 'SG3'), ('scaffold_18', 'SG2'), ('scaffold_25', 'SG3'), ('scaffold_28', 'SG3'), ('scaffold_7', 'SG2'), ('scaffold_11', 'SG2'), ('scaffold_12', 'SG2'), ('scaffold_14', 'SG3'), ('scaffold_63', 'SG2'), ('scaffold_66', 'SG5'), ('scaffold_41', 'SG2'), ('scaffold_47', 'SG2'), ('scaffold_52', 'SG3'), ('scaffold_54', 'SG2'), ('scaffold_55', 'SG2'), ('scaffold_57', 'SG2'), ('scaffold_5', 'SG2'), ('scaffold_37', 'SG2'), ('scaffold_38', 'SG2'), ('scaffold_40', 'SG3'), ('scaffold_42', 'SG2'), ('scaffold_48', 'SG2'), ('scaffold_22', 'SG3'), ('scaffold_23', 'SG2'), ('scaffold_17', 'SG2'), ('scaffold_35', 'SG2'), ('scaffold_36', 'SG2'), ('scaffold_65', 'SG2'), ('scaffold_26', 'SG2'), ('scaffold_27', 'SG2'), ('scaffold_30', 'SG2'), ('scaffold_31', 'SG2'), ('scaffold_33', 'SG2'), ('scaffold_39', 'SG4'), ('scaffold_50', 'SG2'), ('scaffold_53', 'SG3'), ('scaffold_60', 'SG2'), ('scaffold_61', 'SG2'), ('scaffold_62', 'SG2'), ('scaffold_64', 'SG2'), ('scaffold_15', 'SG4'), ('scaffold_3', 'SG5'), ('scaffold_21', 'SG3'), ('scaffold_24', 'SG3'), ('scaffold_29', 'SG3'), ('scaffold_34', 'SG4'), ('scaffold_32', 'SG3'), ('scaffold_56', 'SG2'), ('scaffold_8', 'SG2'), ('scaffold_9', 'SG2'), ('scaffold_10', 'SG2'), ('scaffold_13', 'SG2'), ('scaffold_2', 'SG3'), ('scaffold_19', 'SG3'), ('scaffold_20', 'SG3'), ('scaffold_45', 'SG6'), ('scaffold_46', 'SG6'), ('scaffold_6', 'SG1')]) 24-02-04 09:01:29 [INFO] Outputing chromosome - subgenome assignments to /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_phase-results/CBC_k15_q200_f2.chrom-subgenome.tsv 24-02-04 09:01:29 [INFO] Outputing significant differiential kmer - subgenome maps to /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_phase-results/CBC_k15_q200_f2.sig.kmer-subgenome.tsv 24-02-04 09:01:29 [INFO] Start Pool with 1 process(es) 24-02-04 09:01:29 [INFO] 3 significant subgenome-specific kmers 24-02-04 09:01:29 [INFO] 3 SG1-specific kmers 24-02-04 09:01:29 [INFO] run CMD: Rscript /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_phase-results/CBC_k15_q200_f2.kmer.mat.R 24-02-04 09:01:31 [INFO] Outputing PCA plot to /netscratch/dep_mercier/grp_marques/marques/LPA/CBC/SubPhaser/wgdi/non-necessary/CBC_phase-results/CBC_k15_q200_f2.kmer_pca.pdf Traceback (most recent call last): File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/bin/subphaser", line 33, in sys.exit(load_entry_point('subphaser==1.2.6', 'console_scripts', 'subphaser')()) File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/main.py", line 797, in main pipeline.run() File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/main.py", line 469, in run cluster.pca(outfig, n_components=self.nsg, sg_color=self.colors,) File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/subphaser-1.2.6-py3.8.egg/subphaser/Cluster.py", line 50, in pca X_pca = pca.fit_transform(self.data) File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/sklearn/decomposition/_pca.py", line 383, in fit_transform U, S, Vt = self._fit(X) File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/sklearn/decomposition/_pca.py", line 430, in _fit return self._fit_full(X, n_components) File "/netscratch/dep_mercier/grp_marques/bin/marques-envs/SGphasing/lib/python3.8/site-packages/sklearn/decomposition/_pca.py", line 446, in _fit_full raise ValueError("n_components=%r must be between 0 and " ValueError: n_components=6 must be between 0 and min(n_samples, n_features)=4 with svd_solver='full'

However, it produced the two PDFs attached, which I assume seems to indicate a pretty much autohexaploid origin, right?

CBC_k15_q200_f2.kmer_freq.pdf CBC_k15_q200_f2.kmer.mat.pdf

dabitz avatar Feb 05 '24 08:02 dabitz

The error is because there are too few differential kmers (only four). But it is not the time to say it is an autohexaploid. You may set -nsg 3 and -baseline 2, or prune the three allelic chromosome sets to result three homoeologous chromosome sets like the wheat's ABD assembly. Even if it was an allohexploid (for example AABBDD), the current settings are identify differential kmers by comparing the homologous chromosome pairs (e.g. the two As).

zhangrengang avatar Feb 07 '24 01:02 zhangrengang

Ok, thanks a lot for the suggestion. I have finally managed to run SubPhaser using the unphased genome version and as initially suspected, I guess it looks pretty much like an autohexaploid except for a few chromosomes... Due to introgression maybe??? CBC_hap1k15_q200_f2.circos.pdf CBC_hap1k15_q200_f2.LTR_Gypsy.tree.pdf CBC_hap1k15_q200_f2.ltr.insert.density.pdf CBC_hap1k15_q200_f2.kmer_pca.pdf CBC_hap1k15_q200_f2.kmer.mat.pdf

dabitz avatar Feb 19 '24 07:02 dabitz

Yes, it looks like an autohexaploid. You may generate a kmer histogram and Smudgeplot (https://github.com/KamilSJaron/smudgeplot) for cross-valiadation. The plots can be generated from whole-genome HiFi reads. Introgression is hard to say based on the results only.

zhangrengang avatar Feb 23 '24 02:02 zhangrengang