TypeTE icon indicating copy to clipboard operation
TypeTE copied to clipboard

Building from source issues - Is TypeTE Docker container ready?

Open moldach opened this issue 4 years ago • 9 comments

As I'm working with patient data (level 4 data) we're on a secure Linux compute cluster where, for security reasons, it is not possible to make outbound connections to the internet.

Originally I had tried building from source, however, I ran into major issues with the installation of a particular Perl dependency (see notes at bottom) therefore I would like to run the Docker container for TypeTE but not sure if it's ready or not?

me not knowing how to use the container

sudo docker pull cgoubert/typete
sudo docker run -it --entrypoint /home/TypeTE/softwares/TypeTE/ cgoubert/typete run_TypeTE_NRef.sh &> TypeTE.log &

Doesn't look running this container would be straight-forward, so if it's functional some documentation would be greatly helpful.

Issues with building from source

As I have no outbound connection to the internet I cannot use pip to download Perl modules. After downloading/installing all the dependencies listed I ran typeTE and recieved errors about missing Perl module Bio::SeqIO so I have to do the following.

First, I grab the link for the module from meta::cpan website on a laptop with internet connection.

wget https://cpan.metacpan.org/authors/id/C/CJ/CJFIELDS/BioPerl-1.7.8.tar.gz

Then I transfer this to the secure computing environment into my perl5 directory and tar zxvf BioPerl-1.7.8.tar.gz && cd BioPerl, followed by building perl Makefile.PL. I will then verify the installation with perl -e "use Bio::SeqIO (if I don't see errors that means it's installed).

[moldach@marc TypeTE-Test]$ perl -e "use Bio::SeqIO;"
[moldach@marc TypeTE-Test]$

Next try to run typeTE again but I get an error about String::Approx so I follow the same method described, followed by perl -e "use String::Approx qw(amatch);" - things appear to be installed:

 [moldach@marc TypeTE-Test]$ perl -e "use String::Approx qw(amatch);"
[moldach@marc TypeTE-Test]$

I add both the perl -e statements now to the top of my batch script and try to run typeTE and here is where the odd behavior is happening:

Script

#!/bin/bash
#SBATCH --job-name=typeTE_test # Job name
#SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH [email protected] # Where to send mail
#SBATCH --ntasks=1 #Run on a single CPU
#SBATCH --cpus-per-task=1 # How many cores?
#SBATCH --mem-per-cpu=1G
#SBATCH --output=typeTE_test_%j.log # Standard output and error log
#SBATCH --error=typeTE_test_%j.err # Error log
#SBATCH --time=01:00:00
pwd; hostname; date

perl -e "use Bio::Seq"
perl -e "use String::Approx"

#bash run_TypeTE_Ref.sh

date

As you can see I've commented out the run_TypeTE_Ref.sh script and the first call to Bio::Seq runs successfully without error; however, the call to String::Approx throws an error.

Error

Can't locate String/Approx.pm in @INC (you may need to install the String::Approx module) (@INC contains: /project/M-mtgraovac182840/perl5-matt/lib/perl5 /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5) at -e line 1.
BEGIN failed--compilation aborted at -e line 1.

Albeit, not directly a typeTE issue, this issue with the String::Approx Perl module dependency in the linchpin preventing me from using this tool.

While we are fairly new to working in this restrictive environment I have successfully installed 15 other Perl modules into /project/M-mtgraovac182840/perl5-matt/, by running the install process described above (e.g. tar ... && cd ... && perl Makefile.pl)

moldach avatar Feb 06 '21 19:02 moldach

Dear Matthiew,

Thank you for your interest! I'm so sorry, indeed this repos is a dev version that is not functional. It's not on your end!

For the use of TypeTE as in the paper, I recommend to use the github. Be sure that you have bioperl properly installed because it is a source of problems.

We are currently working to improve the deletion pipeline (reference insertions) and this will be integrated with nextflow/docker. However we need a few more months on the dev!

Best,

Clément

clemgoub avatar Feb 08 '21 15:02 clemgoub

By the way, I just found you v a p o R w a v e package and I love it! I'm gonna try it out for my next presentation! =)

clemgoub avatar Feb 08 '21 18:02 clemgoub

Reading your issue again, I wonder if this is not related to the PERL5LIB variable which doesn't points toward local libraries. I am not a perl expert, but maybe @jainy who coded the perl scripts can help you!

Best,

Clément

clemgoub avatar Feb 10 '21 13:02 clemgoub

Hi Matthiew,

Can you try to install String::Approx using an alternative method and check if that works.

perl -MCPAN -e shell install String::Approx

Thanks Jainy

jainy avatar Feb 10 '21 18:02 jainy

Hi @jainy

Where does perl -MCPAN -e shell, followed by install String::Approx install modules?

Here is what I see inside the lib directory `perl5/lib/perl5:

(base) mtg@mtg-ThinkPad-P53:~/perl5/lib/perl5$ ll
total 212
drwxrwxr-x 15 mtg mtg   4096 Jan  7 16:36 ./
drwxrwxr-x  3 mtg mtg   4096 Jul 21  2020 ../
drwxrwxr-x  3 mtg mtg   4096 Jul 21  2020 5.30.0/
drwxrwxr-x  2 mtg mtg   4096 Jan  7 14:44 App/
drwxrwxr-x  3 mtg mtg   4096 Jul 21  2020 Archive/
drwxrwxr-x  9 mtg mtg   4096 Jan  7 14:44 CPAN/
-r--r--r--  1 mtg mtg 146411 Jun 12  2020 CPAN.pm
drwxrwxr-x  2 mtg mtg   4096 Jul 21  2020 Devel/
drwxrwxr-x  3 mtg mtg   4096 Jan  7 16:36 Exporter/
drwxrwxr-x  3 mtg mtg   4096 Jul 21  2020 lib/
drwxrwxr-x  3 mtg mtg   4096 Jan  7 14:46 List/
drwxrwxr-x  2 mtg mtg   4096 Jul 21  2020 local/
drwxrwxr-x  2 mtg mtg   4096 Jul 21  2020 Mock/
drwxrwxr-x  4 mtg mtg   4096 Jul 21  2020 POD2/
drwxrwxr-x  3 mtg mtg   4096 Jan  7 16:36 Test/
drwxrwxr-x 11 mtg mtg   4096 Feb 17 10:54 x86_64-linux-gnu-thread-multi/

These don't look like String::Approx components - I could be wrong.

(base) mtg@mtg-ThinkPad-P53:~/perl5/lib/perl5$ find . -name "String*"
./Test/Deep/String.pm
./Archive/Zip/StringMember.pm
./x86_64-linux-gnu-thread-multi/auto/String
./x86_64-linux-gnu-thread-multi/String

As I mentioned, there is no outgoing internet connection on this server so I can install String::Approx with that method directly; I need to run this on another computer with internet connection and then transfer the compiled module over. This results in module folder for each Perl library:

(base) [moldach@marc TypeTE-Test]$ ll /project/M-mtgraovac182840/perl5-matt/lib/perl5
total 216
drwxrwsr-x  3 moldach M-mtgraovac182840  4096 Jan  7 14:59 5.30.0
drwxrwsr-x  2 moldach M-mtgraovac182840  4096 Jan  7 14:59 App
drwxrwsr-x  3 moldach M-mtgraovac182840  4096 Jan  7 14:59 Archive
drwxr-sr-x 24 moldach M-mtgraovac182840  8192 Feb  5 13:51 Bio
-r--r--r--  1 moldach M-mtgraovac182840  7252 Feb  2 22:04 BioPerl.pm
drwxrwsr-x  9 moldach M-mtgraovac182840  4096 Jan  7 14:59 CPAN
-r--r--r--  1 moldach M-mtgraovac182840     0 Jan  7 14:59 CPAN.pm
drwxr-sr-x  2 moldach M-mtgraovac182840  4096 Jan  7 15:35 Capture
drwxr-sr-x  2 moldach M-mtgraovac182840  4096 Jan  7 15:40 Config
drwxr-sr-x  2 moldach M-mtgraovac182840  4096 Jan  7 16:13 Data
drwxrwsr-x  2 moldach M-mtgraovac182840  4096 Jan  7 15:52 Devel
drwxr-sr-x  3 moldach M-mtgraovac182840  4096 Jan  7 16:41 Exporter
drwxr-sr-x  3 moldach M-mtgraovac182840  4096 Jan  7 16:14 File
drwxrwsr-x  3 moldach M-mtgraovac182840  4096 Jan  7 16:34 List
drwxr-sr-x  3 moldach M-mtgraovac182840  4096 Feb  5 15:17 Method
drwxrwsr-x  2 moldach M-mtgraovac182840  4096 Jan  7 14:59 Mock
drwxr-sr-x  4 moldach M-mtgraovac182840  4096 Jan  7 16:08 Module
drwxr-sr-x  3 moldach M-mtgraovac182840  4096 Feb  5 15:18 Moo
-r--r--r--  1 moldach M-mtgraovac182840 34419 Nov 24 17:58 Moo.pm
drwxr-sr-x  2 moldach M-mtgraovac182840  4096 Jan  7 16:20 Number
drwxrwsr-x  4 moldach M-mtgraovac182840  4096 Jan  7 14:59 POD2
drwxr-sr-x  3 moldach M-mtgraovac182840  4096 Feb  5 14:59 Parallel
drwxrwsr-x  4 moldach M-mtgraovac182840  4096 Jan  7 15:45 Test
drwxr-sr-x  2 moldach M-mtgraovac182840  4096 Jan  7 16:17 Text
drwxr-sr-x  3 moldach M-mtgraovac182840  4096 Jan  7 15:25 inc
drwxrwsr-x  3 moldach M-mtgraovac182840  4096 Jan  7 14:59 lib
drwxrwsr-x  2 moldach M-mtgraovac182840  4096 Jan  7 14:59 local
-r--r--r--  1 moldach M-mtgraovac182840  1218 Sep  2 04:16 oo.pm
drwxr-sr-x  5 moldach M-mtgraovac182840  4096 Feb  5 14:17 x86_64-linux
drwxrwsr-x 10 moldach M-mtgraovac182840  4096 Jan  7 14:59 x86_64-linux-gnu-thread-multi

Where each module folder looks like:

(base) [moldach@marc TypeTE-Test]$ tree /project/M-mtgraovac182840/perl5-matt/lib/perl5/Parallel/
/project/M-mtgraovac182840/perl5-matt/lib/perl5/Parallel/
|-- ForkManager
|   `-- Child.pm
`-- ForkManager.pm

Now, when I look on the laptop where I installed with your method I can find

$ find ~ -name  "*String::Approx*"
/home/mtg/.cpan/build/String-Approx-3.28-0/blib/man3/String::Approx.3pm
/home/mtg/perl5/man/man3/String::Approx.3pm

Okay let's take a look back on the secure linux in the man3 sub-directory:

(base) [moldach@marc man3]$ pwd
/project/M-mtgraovac182840/perl5-matt/man/man3
(base) [moldach@marc man3]$ ll
total 7232
-rw-r--r-- 1 moldach M-mtgraovac182840  12988 Jan  7 15:00 App::Cpan.3pm
-rw-r--r-- 1 moldach M-mtgraovac182840  74982 Jan  7 15:00 Archive::Zip.3pm
-r--r--r-- 1 moldach M-mtgraovac182840      0 Jan  7 14:59 Archive::Zip::FAQ.3pm
-rw-r--r-- 1 moldach M-mtgraovac182840   6734 Jan  7 15:00 Archive::Zip::MemberRead.3pm
-r--r--r-- 1 moldach M-mtgraovac182840      0 Jan  7 14:59 Archive::Zip::Tree.3pm
-r--r--r-- 1 moldach M-mtgraovac182840  20959 Feb  5 13:51 Bio::Align::AlignI.3
-r--r--r-- 1 moldach M-mtgraovac182840  18616 Feb  5 13:51 Bio::Align::DNAStatistics.3

...

-r--r--r-- 1 moldach M-mtgraovac182840  20053 Feb  5 14:49 String::Approx.3

...

Above, I've cut-off most of the output, however, two things are apparent/confusing:

  1. There are two types of files inside the man3 directory: .3 and .3pm files but it's not clear to me what the difference is?
  2. My installation of String::Approx created a .3 file; however, using the perl -MCPAN -e shell creates a .3pm file instead.

Therefore, I tried to copy over this .3pm file and try again...

And @clemgoub, on your point about the possibility of it being related to PERL5LIB, I didn't think so, since the bash script did not error on perl -e "use Bio::Seq". However, to assuage those concerns I've now also included paths to PERL5LIB at the start of the script:

#!/bin/bash
#SBATCH --job-name=typeTE_test # Job name
#SBATCH --ntasks=1 #Run on a single CPU
#SBATCH --cpus-per-task=1 # How many cores?
#SBATCH --mem-per-cpu=1G
#SBATCH --output=typeTE_test_%j.log # Standard output and error log
#SBATCH --error=typeTE_test_%j.err # Error log
#SBATCH --time=00:05:00

PATH="/project/M-mtgraovac182840/perl5-matt/bin${PATH:+:${PATH}}"; export PATH;
PERL5LIB="/project/M-mtgraovac182840/perl5-matt/lib/perl5${PERL5LIB:+:${PERL5LIB}}"; export PERL5LIB;
PERL_LOCAL_LIB_ROOT="/project/M-mtgraovac182840/perl5-matt${PERL_LOCAL_LIB_ROOT:+:${PERL_LOCAL_LIB_ROOT}}"; export PERL_LOCAL_LIB_ROOT;
PERL_MB_OPT="--install_base \"/project/M-mtgraovac182840/perl5-matt\""; export PERL_MB_OPT;
PERL_MM_OPT="INSTALL_BASE=/project/M-mtgraovac182840/perl5-matt"; export PERL_MM_OPT;
alias perl='/project/M-mtgraovac182840/tools/perl-5.32.0-good/perl'

perl -e "use Bio::Seq"
perl -e "use String::Approx qw(amatch);"

In the error log, we see that no error is raised for Bio::Seq but for String::Approx

Can't locate String/Approx.pm in @INC (you may need to install the String::Approx module) (@INC contains: /project/M-mtgraovac182840/perl5-matt/lib/perl5 /project/M-mtgraovac182840/perl5-matt/lib/perl5 /project/M-mtgraovac182840/perl5-matt/lib/perl5 /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5) at -e line 1.
BEGIN failed--compilation aborted at -e line 1.

moldach avatar Feb 17 '21 18:02 moldach

Hi Matthiew,

I am not sure. I found a webpage that describes the installation of perl modules locally. Can you try that.

https://blogs.iu.edu/ncgas/2019/05/30/installing-perl-modules-locally/

This describes how to get the String::Approx and install locally and add that path to the bashrc file as needed.

Hope it helps!

Best, Jainy

jainy avatar Feb 17 '21 19:02 jainy

I am not sure. I found a webpage that describes the installation of perl modules locally. Can you try that.

https://blogs.iu.edu/ncgas/2019/05/30/installing-perl-modules-locally/

That is exactly the process I described above...

local_00 local_01

moldach avatar Feb 18 '21 17:02 moldach

Hi Matthiew,

I am sorry if I am repeating myself. Just want to make sure that you downloaded String::Apprx module from cpan website using the following commands. In your message i see that you downloaded and installed bioperl but not String::Approx.

wget https://cpan.metacpan.org/authors/id/J/JH/JHI/String-Approx-3.28.tar.gz

Best, Jainy

jainy avatar Feb 18 '21 17:02 jainy

@jainy yes, here were the commands I used:

$ wget https://cpan.metacpan.org/authors/id/J/JH/JHI/String-Approx-3.28.tar.gz
# transfer this to secure cluster in the `perl5` directory
$ tar -xzvf String-Approx-3.28.tar.gz 
$ cd String-Approx-3.28
$ perl Makefile.pl PREFIX=$PWD
Checking if your kit is complete...
Looks good
Only one of PREFIX or INSTALL_BASE can be given.  Not both.

Here is a bit different than the instructions because it's failing with the PREFIX=$PWD parameter - I need to run the following instead:

$ perl Makefile.PL
Checking if your kit is complete...
Looks good
Generating a Unix-style Makefile
Writing Makefile for String::Approx
Writing MYMETA.yml and MYMETA.json
$ make 
$ make install
$ make test # shows Result: PASS

Let's confirm it work from the LOGIN node:

$ perl -e "use String::Approx qw(amatch);"
$

Looks like it works there.

Now try submitting a script and make sure to include the environmental variables that affect Perl5:

#!/bin/bash
#SBATCH --job-name=typeTE_test # Job name
#SBATCH --ntasks=1 #Run on a single CPU
#SBATCH --cpus-per-task=1 # How many cores?
#SBATCH --mem-per-cpu=1G
#SBATCH --output=typeTE_test_%j.log # Standard output and error log
#SBATCH --error=typeTE_test_%j.err # Error log
#SBATCH --time=00:05:00
pwd; hostname; date

PATH="/project/M-mtgraovac182840/perl5-matt/bin${PATH:+:${PATH}}"; export PATH;
PERL5LIB="/project/M-mtgraovac182840/perl5-matt/lib/perl5${PERL5LIB:+:${PERL5LIB}}"; export PERL5LIB;
PERL_LOCAL_LIB_ROOT="/project/M-mtgraovac182840/perl5-matt${PERL_LOCAL_LIB_ROOT:+:${PERL_LOCAL_LIB_ROOT}}"; export PERL_LOCAL_LIB_ROOT;

### the article recommends un-setting the following which I had in there before - comment them out
#PERL_MB_OPT="--install_base \"/project/M-mtgraovac182840/perl5-matt\""; export PERL_MB_OPT;
#PERL_MM_OPT="INSTALL_BASE=/project/M-mtgraovac182840/perl5-matt"; export PERL_MM_OPT;
PERL_MB_OPT=
PERL_MM_OPT=

## include Bio::Seq first, as this used the exact same local installation as String::Approx
### Bio::Seq passes successfully but String::Approx throws an error
perl -e "use Bio::Seq"
perl -e "use String::Approx qw(amatch);"

And I get the following error:

Can't locate String/Approx.pm in @INC (you may need to install the String::Approx module) (@INC contains: /project/M-mtgraovac182840/perl5-matt/lib/perl5 /project/M-mtgraovac182840/perl5-matt/lib/perl5 /project/M-mtgraovac182840/perl5-matt/lib/perl5 /project/M-mtgraovac182840/perl5-matt/lib/perl5 /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5) at -e line 1.
BEGIN failed--compilation aborted at -e line 1.

moldach avatar Feb 19 '21 23:02 moldach