deepsignal2 icon indicating copy to clipboard operation
deepsignal2 copied to clipboard

extract_feature.py`s option parameter

Open YuYangmio opened this issue 2 years ago • 6 comments

Dear Peng, I want to try to train your model, but in the extract -- feature file, I find the option -- Positions. I wonder if this is the minimap2 file as described in the article?I find that the default is None. Does the inclusion of location information have any effect on the training of the model? Best YU

YuYangmio avatar May 24 '22 07:05 YuYangmio

--positions specifies high-confidence sites (e.g., sites have 0 or 1 methylation frequencies in WGBS) when you want to extract training samples. You can also extract all samples (without setting --positions) first, then generate the samples for training from all samples using your own scripts.

Best, Peng

PengNi avatar May 24 '22 08:05 PengNi

Dear Peng, When I tried to train your model with the EScherichia coli reference data set of your paper, I found the following errors in using Guppy for r9 model data " Fast5 read file is invalid. Raw data field 'median_before' has wrong type."Have you ever encountered this problem?May I ask what the solution is? Best, Yu

YuYangmio avatar May 30 '22 09:05 YuYangmio

Hi @YuYangmio , I am not sure what exactly the issue is. May be the R9 data is too old for Guppy to process. R9 pore reads may have been deprecated by ONT. I suggest you using some new data (like R9.4.1/R10.3) for your test.

Best, Peng

PengNi avatar May 30 '22 14:05 PengNi

Dear Peng, When I tried to train your model again ,i found that "the result of deepsignal2 extract is strange because of the methy_label param is only 1 not 0. Usually, two-category`s data is 0/1.and I used the tsv data to train the model,i found that ACC=1.0 LOSS=0. deepsignal2 extract -i ../Notts/FAF15665-16056159 -o human.fast5s.CG.fea tures.tsv --corrected_group RawGenomeCorrected_000 --nproc 30 --motifs CG

YuYangmio avatar Jul 19 '22 10:07 YuYangmio

Dear Peng, When I tried to train your model again ,i found that "the result of deepsignal2 extract is strange because of the methy_label param is only 1 not 0. Usually, two-category`s data is 0/1.and I used the tsv data to train the model,i found that ACC=1.0 LOSS=0. deepsignal2 extract -i ../Notts/FAF15665-16056159 -o human.fast5s.CG.fea tures.tsv --corrected_group RawGenomeCorrected_000 --nproc 30 --motifs CG

when you want to extract negative labels, --methy_label should be set to 0.

Best, Peng

PengNi avatar Aug 16 '22 14:08 PengNi

Maybe you can check issue #7 for more information.

PengNi avatar Aug 16 '22 14:08 PengNi