RepeatMasker icon indicating copy to clipboard operation
RepeatMasker copied to clipboard

Errors from multiple programs when file locking is unavailable (e.g. "Species not known", "RepeatMasker::createLib(): Error invoking makeblastdb")

Open mjgomez12 opened this issue 3 years ago • 15 comments

Describe the issue

Hi, I've been trying to run repeatmasker v4.1.1, but no matter what species I put in the -species parameter, it says species not known.

Reproduction steps The command used was: ` source activate repeatmasker-env

RepeatMasker -pa 24 -e ncbi -species xenopus file.fa `

Log output

` RepeatMasker version 4.1.1 Search Engine: NCBI/RMBLAST [ 2.10.0+ ]

Using Master RepeatMasker Database: /hpcfs/home/mj.gomez12/.conda/envs/repeatmasker-env/share/ RepeatMasker/Libraries/RepeatMaskerLib.h5 Title : Version : Date : Families :

Species "xenopus" is not known to RepeatMasker. There may not be any TE families defined in the libraries for this species/clade or there may be an error in the spelling. Please check your entry against the NCBI Taxonomy database and/or try using a broader clade or related species instead. The full list of species/clades defined in the library may be obtained using the famdb.py script. `

Environment (please include as much of the following information as you can find out):

  • How did you install RepeatMasker? e.g. manual installation from repeatmasker.org, bioconda, the Dfam TE Tools container, or as part of another bioinformatics tool? * I installed RepeatMasker via bioconda.

  • Which version of RepeatMasker do you have? The output of RepeatMasker -v can be used to find this. Version 4.1.1

  • Have you installed RepBase RepeatMasker Edition, or the full Dfam database? Not that I know (I don't know if it comes with the bioconda package)

  • Operating system and version. The output of uname -a and lsb_release -a can be used to find this. Linux magnus.local 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Additional context

  • Add any other context you have about the problem here. Some possible examples:
    • If an older version of RepeatMasker worked before I tried installing version 4.1.2.p1, but was not able to get it running and had a lot of problems with it

mjgomez12 avatar Jul 23 '21 14:07 mjgomez12

That is strange, the main library file seems to be missing or empty. I do not see the same problem on another bioconda installation of RepeatMasker 4.1.1. Can you share the output of these commands?

file /hpcfs/home/mj.gomez12/.conda/envs/repeatmasker-env/share/RepeatMasker/Libraries/RepeatMaskerLib.h5

/hpcfs/home/mj.gomez12/.conda/envs/repeatmasker-env/share/RepeatMasker/famdb.py -i /hpcfs/home/mj.gomez12/.conda/envs/repeatmasker-env/share/RepeatMasker/Libraries/RepeatMaskerLib.h5 info

jebrosen avatar Jul 23 '21 16:07 jebrosen

Yes,

file /hpcfs/home/mj.gomez12/.conda/envs/repeatmasker-env/share/RepeatMasker/Libraries/RepeatMaskerLib.h5
/hpcfs/home/mj.gomez12/.conda/envs/repeatmasker-env/share/RepeatMasker/Libraries/RepeatMaskerLib.h5: symbolic link to `Dfam.h5'
/hpcfs/home/mj.gomez12/.conda/envs/repeatmasker-env/share/RepeatMasker/famdb.py -i /hpcfs/home/mj.gomez12/.conda/envs/repeatmasker-env/share/RepeatMasker/Libraries/RepeatMaskerLib.h5 info
ERROR:__main__:Error reading file: [Errno 38] Unable to open file (file locking disabled on this file system (use HDF5_USE_FILE_LOCKING environment variable to override), errno = 38, error message = 'Function not implemented')

mjgomez12 avatar Jul 23 '21 17:07 mjgomez12

(file locking disabled on this file system (use HDF5_USE_FILE_LOCKING environment variable to override)

This is probably it. You can run export HDF5_USE_FILE_LOCKING=FALSE to set that variable before running RepeatMasker.

I am surprised and a bit worried to hear you had issues with RepeatMasker 4.1.2-p1. This problem and several others are fixed in that version, and there are not very many new features I would expect to break.

jebrosen avatar Jul 23 '21 17:07 jebrosen

Thanks! It doesn't give me that error anymore, but now it says this (which was the same problem I was having with RepeatMasker 4.1.2-p1):

RepeatMasker version 4.1.1
Search Engine: NCBI/RMBLAST [ 2.10.0+ ]

Using Master RepeatMasker Database: /hpcfs/home/mj.gomez12/.conda/envs/repeatmasker
-env/share/RepeatMasker/Libraries/RepeatMaskerLib.h5
  Title    : Dfam
  Version  : 3.2
  Date     : 2020-07-02
  Families : 6,953

Species/Taxa Search:
  Anura [NCBI Taxonomy ID: 8342]
  Lineage: root;cellular organisms;Eukaryota;Opisthokonta;Metazoa;
           Eumetazoa;Bilateria;Deuterostomia;Chordata;
           Craniata <chordates>;Vertebrata <vertebrates>;
           Gnathostomata <vertebrates>;Teleostomi;Euteleostomi;
           Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amphibia
  113 families in ancestor taxa; 0 lineage-specific families

Building general libraries in: /hpcfs/home/mj.gomez12/.conda/envs/repeatmasker-env/
share/RepeatMasker/Libraries/CONS-Dfam_3.2/general
RepeatMasker::createLib(): Error invoking /hpcfs/home/mj.gomez12/.conda/envs/repeat
masker-env/bin//makeblastdb on file /hpcfs/home/mj.gomez12/.conda/envs/repeatmasker
-env/share/RepeatMasker/Libraries/CONS-Dfam_3.2/general/is.lib.

For 4.1.2-p1:

RepeatMasker version 4.1.2-p1
Search Engine: NCBI/RMBLAST [ 2.10.0+ ]

Using Master RepeatMasker Database: /hpcfs/home/mj.gomez12/.conda/envs/repeatmasker-env2/share/RepeatMasker/Libraries/RepeatMaskerLib.h5
  Title    : Dfam
  Version  : 3.3
  Date     : 2020-11-09
  Families : 6,953

Species/Taxa Search:
  Anura [NCBI Taxonomy ID: 8342]
  Lineage: root;cellular organisms;Eukaryota;Opisthokonta;Metazoa;
           Eumetazoa;Bilateria;Deuterostomia;Chordata;
           Craniata <chordates>;Vertebrata <vertebrates>;
           Gnathostomata <vertebrates>;Teleostomi;Euteleostomi;
           Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amphibia
  113 families in ancestor taxa; 0 lineage-specific families

Building general libraries in: /hpcfs/home/mj.gomez12/.conda/envs/repeatmasker-env2/share/RepeatMasker/Libraries/CONS-Dfam_3.3/general
RepeatMasker::createLib(): Error invoking /hpcfs/home/mj.gomez12/.conda/envs/repeatmasker-env2/bin//makeblastdb on file /hpcfs/home/mj.gomez12/.conda/envs/repeatmasker-env2/share/RepeatMasker/Libraries/CONS-Dfam_3.3/general/is.lib.

mjgomez12 avatar Jul 23 '21 18:07 mjgomez12

RepeatMasker::createLib(): Error invoking /hpcfs/home/mj.gomez12/.conda/envs/repeatmasker-env2/bin//makeblastdb on file /hpcfs/home/mj.gomez12/.conda/envs/repeatmasker-env2/share/RepeatMasker/Libraries/CONS-Dfam_3.3/general/is.lib.

Sorry to see that. Can you post the contents of this file, if it is there? It should have a more detailed error message:

/hpcfs/home/mj.gomez12/.conda/envs/repeatmasker-env2/share/RepeatMasker/Libraries/CONS-Dfam_3.3/general/rmblastdb.log

jebrosen avatar Jul 26 '21 17:07 jebrosen

The contents are:

Building a new DB, current time: 07/23/2021 13:47:39
New DB name:   /hpcfs/home/mj.gomez12/.conda/envs/repeatmasker-env2/share/RepeatMasker/Libraries/CONS-Dfam_3.3/general/is.lib
New DB title:  /hpcfs/home/mj.gomez12/.conda/envs/repeatmasker-env2/share/RepeatMasker/Libraries/CONS-Dfam_3.3/general/is.lib
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B

No volumes were created.

Error: mdb_env_open: Function not implemented

mjgomez12 avatar Jul 26 '21 17:07 mjgomez12

I suspect this is also a problem with file locking not being available, but with the RMBlast / NCBI BLAST+ programs. The best option in this situation is usually to install RepeatMasker itself (or the whole conda environment) on a local disk, or a different network filesystem, that supports file locks. What filesystem do you happen to be using on /hpcfs?

jebrosen avatar Jul 26 '21 18:07 jebrosen

The filesystem is lustre. Running RepeatMasker locally, or in another cluster, is not an option at the moment, unfortunately.

mjgomez12 avatar Jul 26 '21 18:07 mjgomez12

It looks like file locking is a configurable option for lustre (https://stackoverflow.com/questions/50742525/lustre-file-locking-for-concurrent-access).

@rmhubley - I think you have used RepeatMasker on a cluster with the lustre filesystem before. Did you also have issues with this or were you able to run RepeatMasker installed on a lustre filesystem?

jebrosen avatar Jul 26 '21 19:07 jebrosen

I have been trying to use the function flock, but don't quite understand how to use it, and to which files. Do you have any recommendations?

Thanks again for all your help!

mjgomez12 avatar Jul 26 '21 21:07 mjgomez12

This is the main change needed:

You should mount all clients with the "-o flock" mount option to enable globally coherent locking. Then flock() (and I think fcntl() locking) will work.

But, you might not be able to configure this option yourself and it may have some downsides to performance. It is really unfortunate that you don't have access to any physical filesystem; many different programs used by RepeatMasker depend on file locking to some degree.

jebrosen avatar Jul 26 '21 22:07 jebrosen

Yes.. Since this is my university's cluster, I can't do anything about it. I was able to run RepeatMasker a couple of years ago in the same cluster, but I can't remember how, and they probably changed some stuff. How long do you think it will take to run RepeatMasker on a 1.12 Gb animal genome? Using 32Gb of RAM and 12 cpus?

mjgomez12 avatar Jul 27 '21 19:07 mjgomez12

It can vary a lot depending on the assembly quality and libraries in use, but I would guess maybe a day, more or less. If you watch the output, you can get a rough estimate of the progress/speed after the first few minutes/hours.

jebrosen avatar Jul 29 '21 18:07 jebrosen

Hi, I have the same issue "species not known"; though it works for 'human', but not for the 'drosophila' and 'anopheles'. Manual installation Repbase is not installed Dfam release 3.4 ./RepeatMasker -engine wublast -s drosophila my.fa ./RepeatMasker -engine wublast -s Drosophila my.fa

RepeatMasker version 4.1.2-p1
Search Engine: ABBlast/WUBlast [ 3.0 ]

Using Master RepeatMasker Database: /mycomputer/RepeatMasker/Libraries/RepeatMaskerLib.h5
  Title    : Dfam
  Version  : 3.4
  Date     : 2021-07-21
  Families : 281,951



Species "drosophila" is not known to RepeatMasker.  There may
not be any TE families defined in the libraries for this
species/clade or there may be an error in the spelling.
Please check your entry against the NCBI Taxonomy database
and/or try using a broader clade or related species instead.
The full list of species/clades defined in the library may be
obtained using the famdb.py script.

RadPa avatar Aug 10 '21 06:08 RadPa

Hi,I also have the same issue, but not solved yet, how did you solve this problem? Thanks!

ShirelyI avatar Mar 01 '23 09:03 ShirelyI