MicrobeLab

Results 60 comments of MicrobeLab

您好,感谢您对这项研究的兴趣。 1. 由于每个细菌基因组大小不同,为了类别平衡,需要为每个基因组计算-f,每个类别最终模拟的reads数根据经验取足够用的数量(例如50万)但不一定会全部用完,-f计算方法是需要的reads总数乘以reads平均长度再除以基因组大小。 2. 验证集是每个类别模拟之后下采样到相同数量的reads(数量大于约训练集的10%即可,影响不大),-f也是和训练集一样计算一个够用的数值即可。 3. 每个基因组单独模拟,得到simulated reads再合并。 4. 从概率来说,在不设定随机数种子相同的情况下,完全一致的reads是几乎不存在的,无需特别去除,如果确实需要去除,可以用prinseq等各种二代测序质控软件去重。

Hi, The value of --model_dir sets the folder to store weights during training and the folder from which trained weights are loaded in prediction. So, the folder should be same...

In TensorFlow 2.0, the tf.logging module has been removed in favor of the open-source absl-py. To set the logging verbosity in TensorFlow 2.0 and later, you can use the absl...

Hi, The training_set_read_parser function in seq2tfrec_kmer.py parses each training/eval read in biopython-parsed format. Taxon ids are assumed to be available in read names. (E.g. for read with name >NC_018018.1|999|GCF_000265505.1-200000, 999...

Please refer to: https://github.com/MicrobeLab/DeepMicrobes/blob/master/document/train.md

For single-end reads, only one fastq file will need to be interleaved and in the model codes the prediction should be averaged across 2 samples rather than 4 samples.

Hi, it might be a good choice to directly run the python script for converting data to tfrec: seq2tfrec_kmer.py \ --input_seq={} --output_tfrec={}.${kmer}mer.tfrec \ --vocab=${vocab} --kmer=${kmer} \ --seq_type=${seq_type}

Hi, could you please give the command that you ran 'predict_DeepMicrobes.sh' and the exact error message?

Hi, sorry I did not encounter such errors before and have no idea how to fix it.