TypeTE
TypeTE copied to clipboard
Building from source issues - Is TypeTE Docker container ready?
As I'm working with patient data (level 4 data) we're on a secure Linux compute cluster where, for security reasons, it is not possible to make outbound connections to the internet.
Originally I had tried building from source, however, I ran into major issues with the installation of a particular Perl dependency (see notes at bottom) therefore I would like to run the Docker container for TypeTE but not sure if it's ready or not?
me not knowing how to use the container
sudo docker pull cgoubert/typete
sudo docker run -it --entrypoint /home/TypeTE/softwares/TypeTE/ cgoubert/typete run_TypeTE_NRef.sh &> TypeTE.log &
Doesn't look running this container would be straight-forward, so if it's functional some documentation would be greatly helpful.
Issues with building from source
As I have no outbound connection to the internet I cannot use pip to download Perl modules. After downloading/installing all the dependencies listed I ran typeTE and recieved errors about missing Perl module Bio::SeqIO so I have to do the following.
First, I grab the link for the module from meta::cpan website on a laptop with internet connection.
wget https://cpan.metacpan.org/authors/id/C/CJ/CJFIELDS/BioPerl-1.7.8.tar.gz
Then I transfer this to the secure computing environment into my perl5 directory and tar zxvf BioPerl-1.7.8.tar.gz && cd BioPerl, followed by building perl Makefile.PL. I will then verify the installation with perl -e "use Bio::SeqIO (if I don't see errors that means it's installed).
[moldach@marc TypeTE-Test]$ perl -e "use Bio::SeqIO;"
[moldach@marc TypeTE-Test]$
Next try to run typeTE again but I get an error about String::Approx so I follow the same method described, followed by perl -e "use String::Approx qw(amatch);" - things appear to be installed:
[moldach@marc TypeTE-Test]$ perl -e "use String::Approx qw(amatch);"
[moldach@marc TypeTE-Test]$
I add both the perl -e statements now to the top of my batch script and try to run typeTE and here is where the odd behavior is happening:
Script
#!/bin/bash
#SBATCH --job-name=typeTE_test # Job name
#SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH [email protected] # Where to send mail
#SBATCH --ntasks=1 #Run on a single CPU
#SBATCH --cpus-per-task=1 # How many cores?
#SBATCH --mem-per-cpu=1G
#SBATCH --output=typeTE_test_%j.log # Standard output and error log
#SBATCH --error=typeTE_test_%j.err # Error log
#SBATCH --time=01:00:00
pwd; hostname; date
perl -e "use Bio::Seq"
perl -e "use String::Approx"
#bash run_TypeTE_Ref.sh
date
As you can see I've commented out the run_TypeTE_Ref.sh script and the first call to Bio::Seq runs successfully without error; however, the call to String::Approx throws an error.
Error
Can't locate String/Approx.pm in @INC (you may need to install the String::Approx module) (@INC contains: /project/M-mtgraovac182840/perl5-matt/lib/perl5 /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5) at -e line 1.
BEGIN failed--compilation aborted at -e line 1.
Albeit, not directly a typeTE issue, this issue with the String::Approx Perl module dependency in the linchpin preventing me from using this tool.
While we are fairly new to working in this restrictive environment I have successfully installed 15 other Perl modules into /project/M-mtgraovac182840/perl5-matt/, by running the install process described above (e.g. tar ... && cd ... && perl Makefile.pl)
Dear Matthiew,
Thank you for your interest! I'm so sorry, indeed this repos is a dev version that is not functional. It's not on your end!
For the use of TypeTE as in the paper, I recommend to use the github. Be sure that you have bioperl properly installed because it is a source of problems.
We are currently working to improve the deletion pipeline (reference insertions) and this will be integrated with nextflow/docker. However we need a few more months on the dev!
Best,
Clément
By the way, I just found you v a p o R w a v e package and I love it! I'm gonna try it out for my next presentation! =)
Reading your issue again, I wonder if this is not related to the PERL5LIB variable which doesn't points toward local libraries. I am not a perl expert, but maybe @jainy who coded the perl scripts can help you!
Best,
Clément
Hi Matthiew,
Can you try to install String::Approx using an alternative method and check if that works.
perl -MCPAN -e shell install String::Approx
Thanks Jainy
Hi @jainy
Where does perl -MCPAN -e shell, followed by install String::Approx install modules?
Here is what I see inside the lib directory `perl5/lib/perl5:
(base) mtg@mtg-ThinkPad-P53:~/perl5/lib/perl5$ ll
total 212
drwxrwxr-x 15 mtg mtg 4096 Jan 7 16:36 ./
drwxrwxr-x 3 mtg mtg 4096 Jul 21 2020 ../
drwxrwxr-x 3 mtg mtg 4096 Jul 21 2020 5.30.0/
drwxrwxr-x 2 mtg mtg 4096 Jan 7 14:44 App/
drwxrwxr-x 3 mtg mtg 4096 Jul 21 2020 Archive/
drwxrwxr-x 9 mtg mtg 4096 Jan 7 14:44 CPAN/
-r--r--r-- 1 mtg mtg 146411 Jun 12 2020 CPAN.pm
drwxrwxr-x 2 mtg mtg 4096 Jul 21 2020 Devel/
drwxrwxr-x 3 mtg mtg 4096 Jan 7 16:36 Exporter/
drwxrwxr-x 3 mtg mtg 4096 Jul 21 2020 lib/
drwxrwxr-x 3 mtg mtg 4096 Jan 7 14:46 List/
drwxrwxr-x 2 mtg mtg 4096 Jul 21 2020 local/
drwxrwxr-x 2 mtg mtg 4096 Jul 21 2020 Mock/
drwxrwxr-x 4 mtg mtg 4096 Jul 21 2020 POD2/
drwxrwxr-x 3 mtg mtg 4096 Jan 7 16:36 Test/
drwxrwxr-x 11 mtg mtg 4096 Feb 17 10:54 x86_64-linux-gnu-thread-multi/
These don't look like String::Approx components - I could be wrong.
(base) mtg@mtg-ThinkPad-P53:~/perl5/lib/perl5$ find . -name "String*"
./Test/Deep/String.pm
./Archive/Zip/StringMember.pm
./x86_64-linux-gnu-thread-multi/auto/String
./x86_64-linux-gnu-thread-multi/String
As I mentioned, there is no outgoing internet connection on this server so I can install String::Approx with that method directly; I need to run this on another computer with internet connection and then transfer the compiled module over. This results in module folder for each Perl library:
(base) [moldach@marc TypeTE-Test]$ ll /project/M-mtgraovac182840/perl5-matt/lib/perl5
total 216
drwxrwsr-x 3 moldach M-mtgraovac182840 4096 Jan 7 14:59 5.30.0
drwxrwsr-x 2 moldach M-mtgraovac182840 4096 Jan 7 14:59 App
drwxrwsr-x 3 moldach M-mtgraovac182840 4096 Jan 7 14:59 Archive
drwxr-sr-x 24 moldach M-mtgraovac182840 8192 Feb 5 13:51 Bio
-r--r--r-- 1 moldach M-mtgraovac182840 7252 Feb 2 22:04 BioPerl.pm
drwxrwsr-x 9 moldach M-mtgraovac182840 4096 Jan 7 14:59 CPAN
-r--r--r-- 1 moldach M-mtgraovac182840 0 Jan 7 14:59 CPAN.pm
drwxr-sr-x 2 moldach M-mtgraovac182840 4096 Jan 7 15:35 Capture
drwxr-sr-x 2 moldach M-mtgraovac182840 4096 Jan 7 15:40 Config
drwxr-sr-x 2 moldach M-mtgraovac182840 4096 Jan 7 16:13 Data
drwxrwsr-x 2 moldach M-mtgraovac182840 4096 Jan 7 15:52 Devel
drwxr-sr-x 3 moldach M-mtgraovac182840 4096 Jan 7 16:41 Exporter
drwxr-sr-x 3 moldach M-mtgraovac182840 4096 Jan 7 16:14 File
drwxrwsr-x 3 moldach M-mtgraovac182840 4096 Jan 7 16:34 List
drwxr-sr-x 3 moldach M-mtgraovac182840 4096 Feb 5 15:17 Method
drwxrwsr-x 2 moldach M-mtgraovac182840 4096 Jan 7 14:59 Mock
drwxr-sr-x 4 moldach M-mtgraovac182840 4096 Jan 7 16:08 Module
drwxr-sr-x 3 moldach M-mtgraovac182840 4096 Feb 5 15:18 Moo
-r--r--r-- 1 moldach M-mtgraovac182840 34419 Nov 24 17:58 Moo.pm
drwxr-sr-x 2 moldach M-mtgraovac182840 4096 Jan 7 16:20 Number
drwxrwsr-x 4 moldach M-mtgraovac182840 4096 Jan 7 14:59 POD2
drwxr-sr-x 3 moldach M-mtgraovac182840 4096 Feb 5 14:59 Parallel
drwxrwsr-x 4 moldach M-mtgraovac182840 4096 Jan 7 15:45 Test
drwxr-sr-x 2 moldach M-mtgraovac182840 4096 Jan 7 16:17 Text
drwxr-sr-x 3 moldach M-mtgraovac182840 4096 Jan 7 15:25 inc
drwxrwsr-x 3 moldach M-mtgraovac182840 4096 Jan 7 14:59 lib
drwxrwsr-x 2 moldach M-mtgraovac182840 4096 Jan 7 14:59 local
-r--r--r-- 1 moldach M-mtgraovac182840 1218 Sep 2 04:16 oo.pm
drwxr-sr-x 5 moldach M-mtgraovac182840 4096 Feb 5 14:17 x86_64-linux
drwxrwsr-x 10 moldach M-mtgraovac182840 4096 Jan 7 14:59 x86_64-linux-gnu-thread-multi
Where each module folder looks like:
(base) [moldach@marc TypeTE-Test]$ tree /project/M-mtgraovac182840/perl5-matt/lib/perl5/Parallel/
/project/M-mtgraovac182840/perl5-matt/lib/perl5/Parallel/
|-- ForkManager
| `-- Child.pm
`-- ForkManager.pm
Now, when I look on the laptop where I installed with your method I can find
$ find ~ -name "*String::Approx*"
/home/mtg/.cpan/build/String-Approx-3.28-0/blib/man3/String::Approx.3pm
/home/mtg/perl5/man/man3/String::Approx.3pm
Okay let's take a look back on the secure linux in the man3 sub-directory:
(base) [moldach@marc man3]$ pwd
/project/M-mtgraovac182840/perl5-matt/man/man3
(base) [moldach@marc man3]$ ll
total 7232
-rw-r--r-- 1 moldach M-mtgraovac182840 12988 Jan 7 15:00 App::Cpan.3pm
-rw-r--r-- 1 moldach M-mtgraovac182840 74982 Jan 7 15:00 Archive::Zip.3pm
-r--r--r-- 1 moldach M-mtgraovac182840 0 Jan 7 14:59 Archive::Zip::FAQ.3pm
-rw-r--r-- 1 moldach M-mtgraovac182840 6734 Jan 7 15:00 Archive::Zip::MemberRead.3pm
-r--r--r-- 1 moldach M-mtgraovac182840 0 Jan 7 14:59 Archive::Zip::Tree.3pm
-r--r--r-- 1 moldach M-mtgraovac182840 20959 Feb 5 13:51 Bio::Align::AlignI.3
-r--r--r-- 1 moldach M-mtgraovac182840 18616 Feb 5 13:51 Bio::Align::DNAStatistics.3
...
-r--r--r-- 1 moldach M-mtgraovac182840 20053 Feb 5 14:49 String::Approx.3
...
Above, I've cut-off most of the output, however, two things are apparent/confusing:
- There are two types of files inside the
man3directory:.3and.3pmfiles but it's not clear to me what the difference is? - My installation of String::Approx created a
.3file; however, using theperl -MCPAN -e shellcreates a.3pmfile instead.
Therefore, I tried to copy over this .3pm file and try again...
And @clemgoub, on your point about the possibility of it being related to PERL5LIB, I didn't think so, since the bash script did not error on perl -e "use Bio::Seq". However, to assuage those concerns I've now also included paths to PERL5LIB at the start of the script:
#!/bin/bash
#SBATCH --job-name=typeTE_test # Job name
#SBATCH --ntasks=1 #Run on a single CPU
#SBATCH --cpus-per-task=1 # How many cores?
#SBATCH --mem-per-cpu=1G
#SBATCH --output=typeTE_test_%j.log # Standard output and error log
#SBATCH --error=typeTE_test_%j.err # Error log
#SBATCH --time=00:05:00
PATH="/project/M-mtgraovac182840/perl5-matt/bin${PATH:+:${PATH}}"; export PATH;
PERL5LIB="/project/M-mtgraovac182840/perl5-matt/lib/perl5${PERL5LIB:+:${PERL5LIB}}"; export PERL5LIB;
PERL_LOCAL_LIB_ROOT="/project/M-mtgraovac182840/perl5-matt${PERL_LOCAL_LIB_ROOT:+:${PERL_LOCAL_LIB_ROOT}}"; export PERL_LOCAL_LIB_ROOT;
PERL_MB_OPT="--install_base \"/project/M-mtgraovac182840/perl5-matt\""; export PERL_MB_OPT;
PERL_MM_OPT="INSTALL_BASE=/project/M-mtgraovac182840/perl5-matt"; export PERL_MM_OPT;
alias perl='/project/M-mtgraovac182840/tools/perl-5.32.0-good/perl'
perl -e "use Bio::Seq"
perl -e "use String::Approx qw(amatch);"
In the error log, we see that no error is raised for Bio::Seq but for String::Approx
Can't locate String/Approx.pm in @INC (you may need to install the String::Approx module) (@INC contains: /project/M-mtgraovac182840/perl5-matt/lib/perl5 /project/M-mtgraovac182840/perl5-matt/lib/perl5 /project/M-mtgraovac182840/perl5-matt/lib/perl5 /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5) at -e line 1.
BEGIN failed--compilation aborted at -e line 1.
Hi Matthiew,
I am not sure. I found a webpage that describes the installation of perl modules locally. Can you try that.
https://blogs.iu.edu/ncgas/2019/05/30/installing-perl-modules-locally/
This describes how to get the String::Approx and install locally and add that path to the bashrc file as needed.
Hope it helps!
Best, Jainy
I am not sure. I found a webpage that describes the installation of perl modules locally. Can you try that.
https://blogs.iu.edu/ncgas/2019/05/30/installing-perl-modules-locally/
That is exactly the process I described above...

Hi Matthiew,
I am sorry if I am repeating myself. Just want to make sure that you downloaded String::Apprx module from cpan website using the following commands. In your message i see that you downloaded and installed bioperl but not String::Approx.
wget https://cpan.metacpan.org/authors/id/J/JH/JHI/String-Approx-3.28.tar.gz
Best, Jainy
@jainy yes, here were the commands I used:
$ wget https://cpan.metacpan.org/authors/id/J/JH/JHI/String-Approx-3.28.tar.gz
# transfer this to secure cluster in the `perl5` directory
$ tar -xzvf String-Approx-3.28.tar.gz
$ cd String-Approx-3.28
$ perl Makefile.pl PREFIX=$PWD
Checking if your kit is complete...
Looks good
Only one of PREFIX or INSTALL_BASE can be given. Not both.
Here is a bit different than the instructions because it's failing with the PREFIX=$PWD parameter - I need to run the following instead:
$ perl Makefile.PL
Checking if your kit is complete...
Looks good
Generating a Unix-style Makefile
Writing Makefile for String::Approx
Writing MYMETA.yml and MYMETA.json
$ make
$ make install
$ make test # shows Result: PASS
Let's confirm it work from the LOGIN node:
$ perl -e "use String::Approx qw(amatch);"
$
Looks like it works there.
Now try submitting a script and make sure to include the environmental variables that affect Perl5:
#!/bin/bash
#SBATCH --job-name=typeTE_test # Job name
#SBATCH --ntasks=1 #Run on a single CPU
#SBATCH --cpus-per-task=1 # How many cores?
#SBATCH --mem-per-cpu=1G
#SBATCH --output=typeTE_test_%j.log # Standard output and error log
#SBATCH --error=typeTE_test_%j.err # Error log
#SBATCH --time=00:05:00
pwd; hostname; date
PATH="/project/M-mtgraovac182840/perl5-matt/bin${PATH:+:${PATH}}"; export PATH;
PERL5LIB="/project/M-mtgraovac182840/perl5-matt/lib/perl5${PERL5LIB:+:${PERL5LIB}}"; export PERL5LIB;
PERL_LOCAL_LIB_ROOT="/project/M-mtgraovac182840/perl5-matt${PERL_LOCAL_LIB_ROOT:+:${PERL_LOCAL_LIB_ROOT}}"; export PERL_LOCAL_LIB_ROOT;
### the article recommends un-setting the following which I had in there before - comment them out
#PERL_MB_OPT="--install_base \"/project/M-mtgraovac182840/perl5-matt\""; export PERL_MB_OPT;
#PERL_MM_OPT="INSTALL_BASE=/project/M-mtgraovac182840/perl5-matt"; export PERL_MM_OPT;
PERL_MB_OPT=
PERL_MM_OPT=
## include Bio::Seq first, as this used the exact same local installation as String::Approx
### Bio::Seq passes successfully but String::Approx throws an error
perl -e "use Bio::Seq"
perl -e "use String::Approx qw(amatch);"
And I get the following error:
Can't locate String/Approx.pm in @INC (you may need to install the String::Approx module) (@INC contains: /project/M-mtgraovac182840/perl5-matt/lib/perl5 /project/M-mtgraovac182840/perl5-matt/lib/perl5 /project/M-mtgraovac182840/perl5-matt/lib/perl5 /project/M-mtgraovac182840/perl5-matt/lib/perl5 /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5) at -e line 1.
BEGIN failed--compilation aborted at -e line 1.