Error in tfrec_train_kmer.sh with a customized training set
Hi, I'm trying to use tfrec_train_kmer.sh for a training dataset I constructed, and I'm struggling with this. I got the errors below.
(DeepMicrobes) root@bbf7145cde62:/workspace/vamb-data/airways# tfrec_train_kmer.sh -i dmtrain.fa -v /workspace/czj/tokens_merged_12mers.txt -o dmtrain.tfrec -s 2048 -k 12
parallel successfully detected...
seq-shuf successfully detected...
Starting converting dmtrain.fa to TFRecord (mode=training), output will be saved in dmtrain.tfrec
Parameters: kmer=12, vocab_file=/workspace/czj/tokens_merged_12mers.txt, split_size=2048
======================================
1. Shuffling sequences for training...
(echo -n ">"; cat <&0) | sed "s/^>/\x0>/"
======================================
2. Splitting input to 2048 sequences per file...
======================================
3. Converting to TFRecord...
Can't use 'defined(@array)' (Maybe you should just omit the defined()?) at /workspace/czj/DeepMicrobes/DeepMicrobes/bin/parallel line 119.
cat: 'subset*.tfrec': No such file or directory
rm: cannot remove 'subset*.tfrec': No such file or directory
Finished.
The first two lines of the dmtrain.fa looks like this:
>S4C8|22
GTTATAATTTCCCGGCTGGATCTCCTTGAAATCATCAGACAAAATACCTCTTCTTAAAAGTTCTGCCGTGCCTGCAAAGCGAAAATCACGAAGCCCCGGATTGTCTTTTAACTGATAAATCCCATATTTATCAGTATCGCCATACAAAAGCTGTGCTTCCTTGTTTGCGCCGCTTTTTTCAAGTTCCTCCTGCATCGTCCGGAGACTTCTTTCATTTTCCCAATCGCCCTTTTCTATGCCAAAGATACCCTCATGCTCTGTGATCTGCCTCCTGTTCTTCACCGTTGTTTCCGAACCGTCCTCATGGAGCAGGTACACAGGCAGGTCACGGTCAAAAAGCTCCAATGCCCTCCCCTGTGTCAATGGCAGCATTTCATTCCATGTATAGCCGTATTCCTCCATTTCCGATAAGCCGATCATAGGGTCAGGGAGTGCGTCAATCTCTGCCTGTGCGTCAATGACCGCAAGGGCAGCCCCCTGACTGCCTTCTGTTTCCTCATAATAAATATGCTCCGCAAGCTCCCTTGTCTTTTCCATGTCGCCCAGCTTAAAGGCATAATTTACAATCAGGCTGCGCTCATCATCAGAAAAAGCGGTCCTGGCAAATTCCAGACGGTTAATAATGCCCTCCGCCTGCTTTACCGGGGAAAGGTGGCTCGTCATTACCACTTCTTTTTTATCCAGGTCAATCTCAAAATCATTAAATTGCGGCATACCGGAATTTCTGCCGCCGCTGACAATAATCGGGATATAATCAGCCCCGTAGCTTTCCAGACAGTCATGGATATTTTTTCTGTCCATACCCTCCAACTTTGTGAGCGCACCCATGATCTCCTGAACTTCCTCCGGTGTCTTATTGATGATACGTTCCACCTCAAAACCGCCTGAATGTATGATCTTCAGAATCACATCATCTGCCCCGACAGGCTCCTTTGCCCTTTCATTCCACTCCATCTCTGCCTTCAGGTCAATGATCTCATTCTTATCGGTGATATACCGGACATCAAGGTGGTATTCTCCAAATTCCCTTGTTTCCTCATTGGCAATCTCTATAGTCCATGCGCCTTTGCTTTCCAGATATGCGGCAACGCTCAGTCTGTCATCGTCATTCATGGCATAGAGTGCTTCTATGATCTCCGCCGCATTCATTCCCCTGACTTCTACAAGGCTGTACTCAGACAGATCGCTGTTCTGTATCAGCAAAAGGCTTTCTTTTTCCTGCCCCTGCATGATCTCCCGGTCACGCTGTATTTCTCTAAGCTGTTCCTCTATACCTGTGATAAGCTCCGATGCAGTCCTGCGGATCGTATCAAGGGAAGATTTCAGTTCTTTCATATCCTTTCCGCTGCTCCACCCGGCAATATACCCAAAGGAATAACCAGAAGTATCTATGCCGAAGTGCTTACAGACTGTAAACGCTATACTCTCTGCTTCAAATGTTAATATATCCATTTGAGGCCTTATACCCATGTTTTTATAATGTTGATTTTTTATATTATTTCCCTTAAATCCTCACTTTCTTGTATAAAAGATAAGCATTATCAGATACGCTATTCTCAGGTTTTCCCCTCGAATGGGAAGCCGGAAAGGAGCGCATTTGATATGCAAATAAACTATTTAGATGCTGTTTCATCAGTCCTCAATATGATGAAGCAGCCAGACAGCGCATGTAAAAATATAGACATGCACAGAACCTGTTACACCATGTTCTTCAAATACCTGATGGATAAGGGCATTCCTTTTTCAATGGATGCCGCGCTGGACTGGCTTGAGATTAAGAAACAGGAAATTTCCTATGAGACGTGTTCCCAATATAGAAATGCCCTGTTCCGACTTGAGCATTACCTGCTCTTTGGAGATATCGAAAGCCCTTTCTGCCGCTCAGAAGACAGTTTTTTCTGCCGGAGCGGGATGTCGGAATCTTTTTTCCGCCTGACATATGAGCTGGAGGAATACTATGCGGCCAGCCAGAACCCCAGCTATTACCATACGTATTCCGTTGCCACAAAAGAGTTTTTCAAACTTGCGACTTCCCTTGGAATTACAGAGCCGGAAGCAGTCACCATAGATACTCTTATCGAATACTGGAATACTTACTGCAAATCCTGCGGCTCTCCCGTCAGACGCCAGAACGCCGTATGCGCTATGACGGCTCTTATGAAATACCTTCACCTTCGGGGTGATGTGCCGGAGTGTTATCAGCTGGTTCTTTTTGGCTGGAACGCTGAAATACTGTCTGGCATGAGGCTTTCCAAAACAGGCGCCGCATTCCATCCCAGTGTATCTCTTGAACATAAAGCTGAAGGGTATCTTGACGCCTTGGACGATTGGAAATACATGGAATCATCAAAAGCTGTTTACCGCAATGATTTCACCTGGTACTTTATGTTTTTGGAACTTAACCGCCTGGAGCATTCGGCAGAAACTGTAACTCTATTTACAGACATACTTCCGGATTGTCCGAATCAGGCCAAAGGCAGCAATCCTGTATCGGCCCGCCGTTCACACACGATCAGAATGTTTGAAAAGTATCTCCAGGGCACAATGGAATCTAATATGGCGGCTGATCCAAAGCGTGCGTCCGATCATCTTCCGTCATGGAGCAAAAGCATCCTTGATGGTTTTATAGAGAGCCGCAGGCGGGATGGTATGACGAATAATACACTTACTATGTGCAGGGCTGCCGGATGCAGTTTCTTCAAATATCTTGAAGATAATGGAATAGATTATCCGGCATACATAACACCTGATGCAGTGAAAGCATTCCATAACCATGATGTCCACTCGACCCCGGAAAGCAAAAATGCATATGGGACAAAGCTCCGTCAGCTTCTGCGTTACATGGCTGACCAGGATCTGGTCCCGCCAACCCTTGTTTTTGCAGTATCTGCAAGCTGCGCTCCCCGTCGCAGCATCGTTGATGTCCTGAGCGATGATATGGTTGGGAAAATATATGAATACCGCGACAAAGCCTCCACTCCCATAGAACTCAGAGACACAGCTATGGTTATGCTCGGGCTTCGGATGGGTATCAGGGGAGCGGACATCCTGAAGCTTCAGGTAAATGATTTTGACTGGAAAAACAAAACGGTTTCCTTCATCCAGCAGAAAACAGGAAAAGCAATCACGCTTCCAGTCCCAACAGATGTAGGTAATTCTATATATAAATACATCATGAATGGACGTCCGGAATCGGCTGCCACAGGCAGCGGATATATATTTATCCGCCATCAGGCGCCATATATTCCGCTTAAAGTCACAACGGCGTGCCGTGGGGCTTTAAAAAGAATACTTGCTGAATATGGATTTGAACTATCCGCCGGCCAGGGCTTCCATATGACACGGAAGACATTTGCCACAAGAATGCTTCGGGCAGGCAGCAAACTTGATGATATTTCCATCGCCCTCGGGCATGCACGTCCGGAAACTGCCGAGGTATATCTTGAACGTGACGAAGATAAAATGAGGCTCTGCCCTCTGGAATTTGGAGGTGTTTTGTCATGACATACATTTTTGAGAGCGGCCTGGCACATCATATCGAAGGACTCATACAGCAAAAACGGGCGGATGGATATGCCTATAATTGCGAAGAAAAGC
>S4C16|245
CGAGCAAACGAAGGCCGTACTAGAGATTCAGGCCAAGTGGAAGACTATAGGCTATGCTCGCAGAAGCGACAATGAGAAGATCTACGAGCGTTTCCGCGCAGCATGTGACGATTATTTCAATAAGAAAACAGCTTTCTTCAAAGGCAAACGTGAAGAGCTGACCGATAACTACAAGAAGAAGCTGGCCATGGTAGAAGAAGCGGAGAGCCTTCAGGAGAGTTCCGACTGGAAAGAAACCTCTACTCGCTTGGCCGAACTCCAAAAGAAATGGAAAACCATCGGAGCCGTTCCTCATCGGTATAGTGATGAGATATGGAAGCGTTTTACGACTGCATGCGATGCATTCTTCAAACGTAAAAAAGCCGAACAGGGAGATATGCGCTCCGAAGAATGCGAAAACCTGAAGAGCAAGAAAGCAATCATTGCAGAGCTTGAGACTTTGGATTCGGAAGAAGCAAGCGAGGGTATCATCGACAGGCTCAATGCTCTGGCCGGACGTTGGAATTCCATAGGCTTTGTACCGTTCAGAGAGAAGGATACTATCAACAAAGCTTACCGAAAATTGATCGATGGTCTGTACGACAAGCTGAATATCGAACGAAGCAACCGGCGCCTCGAAGGATACAATGCCTCCTTGGAACAACTGGAGGGTGGCGGCAAAGGACAGCTCTATGATGAACGTGATCGTATGACACGTATCCTCGACCGTATGCGCAACGAATTGCAGACCTATACGAACAATCTGGGTTTCCTCAATATATCCAGTAAAAGTGGGAATAGCCTGATGCGCGAAATAGAGCGCAAGAAGGAAAAGCTGGAAGAAGACATCCGTCTGATGATCGAAAAGATCAAGCTGATCGACAAGAAGGTGGAAGAGCTGAACTCTAAAGAGTAGGCTATCCCCCACTCCATCGGCAAAATAAAACCGAAGGAGAAAATAGCATTCAAGAATTGAGGTGAGCCACGAAAGTTTTATATCAGACTTTCGTGGCTCACTTCTTTTCTACTCGCTACTCATTGACAGAGTAAGAAACGCAAGGCCAAGAGATGAAAGACAGATACAAGGCTGTTTTTTATCTCGATAGCGCAACAACCAAAAGGGCTATGCTGTTTCATTTCTAAAAGGATATACCGATGAAGATAGTAATAGCGGACAGCTATGCAGCTCTACCCGGCGATTTGGACTGGAGCGGTATCGAAGAAATGGGCGAATGCGTGTTCTACGAATATACCCGTCCGGAGGATTTGACTCTGCGTGCTGTCGATGCTGAAATAGTGCTTACCAACAAGACTCCTGTGACTGCGGCCGACATGGAAAAGATGCCCCACCTACGTTACATCGGACTGATGATTACAGGCCTTAATCTTATAGATATGGATGCTGCTCGTCAGCGTGGTATCACCATAACGAACATCCCCCACTATAGCACAGAATCAGTAGCCCAAATGGCAATCTCGCATCTACTGCACATAACCATGCCGATCGGAGAACTTTCCCGGCAGGTGAAAGATGGTTGCTGGCAGAGCAATTACGAACAAATCTCTCGCAATACTTATCAGATAGAACTGAGCGGACTGACGATGGCTATCGTGGGACTTGGGGCAATAGGTACACGTGTAGCGGAAATGGCACGTGGATTCGGCATGAAGATTTTGGCACATACATCCAAATCTCCAATCGAGTTGCCTTCTTATATAGAAAAGTCCGATAGCCTGGAGAAGCTTTTCTCTCGGGCTGATGTGCTGAGTCTGCATTGCCCGCTCACAGCGCAAACCCAAAGGATGGTATCGGCTGATAGGCTGGCACTGATGAAACCGACAGCTATCCTGCTGAACATGTCCCGAGGAAGTCTGATCGATGAAAAAGCATTAGCCTCTGCCCTAAATGAAGGACGGCTCTATGCTGCAGGCTTGGACGTACTTGCGGAAGAACCTCCATGCATGGATCACCCTTTGCTTAAGGCGCGTAATTGTCACATCACGCCACATATGGGCTGGAATACGGATGCAGCGCGCTTGCGCCTTTCTCGGACGATCAAGGAGAATCTTCGGGCTTTCATTTCCGGTCACCCTGTCAATGTCGTTTAAGAACAGAATCCATCAAAACGATTATTTTCCGACCAATACCTTTCGAAGAATTTGACGGATTTATCCTCGATAAATCTACGTGTGTTCGA
Could you have a look and see if I've done anything wrong? Thanks!
Hi, you did not have parallel correctly installed. Try installing it using the command: (wget -O - pi.dk/3 || curl pi.dk/3/) | bash
Thanks for your reply!
I ran wget -O - pi.dk/3 | bash since I'm on a Linux OS. The error repeats after the installation is completed. I would like to solve the problem myself but I don't really know where to start.
Many thanks in advance!
Please check first whether parallel itself has been well installed before running our scripts. Thanks