SGTK icon indicating copy to clipboard operation
SGTK copied to clipboard

Problem with the removal of the Bio.Alphabet module in Biopython 1.78 since September 2020

Open EA2106-Universite-Francois-Rabelais opened this issue 4 years ago • 6 comments

Dear Olga, for newer installations, I suggest you to comment line 11

#from Bio.Alphabet import IUPAC

and modify the syntax in the definition of gfa_to_contigs and gfa2_to_contigs functions, to match the new SeqRecord style: (https://biopython.org/wiki/Alphabet)

def gfa_to_contigs(args):
    for lib in args.libs['gfa']:
        lib_dir = os.path.dirname(os.path.abspath(lib.name) + '/')
        if not os.path.exists(lib_dir):
            os.makedirs(lib_dir)
        prevdir = os.getcwd()
        os.chdir(lib_dir)

        with open('contigs.fasta', 'w') as out:
            lines = [line.rstrip('\n') for line in open(lib.path[0])]
            for line in lines:
                parts = line.split()
                if parts[0] == 'S':

                   # record = SeqRecord(Seq(parts[2], IUPAC.ambiguous_dna), id=parts[1], description='')

                    record = SeqRecord(Seq(parts[2]), id=parts[1],
                            description='',
                            annotations={'molecule_type': 'DNA'})
                    SeqIO.write(record, out, 'fasta')

        if args.contigs == None:
            args.contigs = []

        args.contigs.append(os.path.abspath('contigs.fasta'))
        os.chdir(prevdir)


def gfa2_to_contigs(args):
    for lib in args.libs['gfa2']:
        lib_dir = os.path.dirname(os.path.abspath(lib.name) + '/')
        if not os.path.exists(lib_dir):
            os.makedirs(lib_dir)
        prevdir = os.getcwd()
        os.chdir(lib_dir)

        with open('contigs.fasta', 'w') as out:
            lines = [line.rstrip('\n') for line in open(lib.path[0])]
            for line in lines:
                parts = line.split()
                if parts[0] == 'S':

                  #  record = SeqRecord(Seq(parts[3], IUPAC.ambiguous_dna), id=parts[1], description='')

                    record = SeqRecord(Seq(parts[3]), id=parts[1],
                            description='',
                            annotations={'molecule_type': 'DNA'})
                    SeqIO.write(record, out, 'fasta')

        if args.contigs == None:
            args.contigs = []

        args.contigs.append(os.path.abspath('contigs.fasta'))
        os.chdir(prevdir)

I also had an issue with fasta.seq.tostring, which was fixed with:

def add_refcoord_to_res_file():
    if len(args.refcoord) == 0:
        return

    lib = args.refcoord[0]

    chrid = {}
    chrlen = []
    fasta_seq = SeqIO.parse(open(lib[0]), 'fasta')
    curid = 0

    for fasta in fasta_seq:

        # name, lenn = fasta.id, len(fasta.seq.tostring())

        (name, lenn) = (fasta.id, len(str(fasta.seq)))
        chrid[name] = curid
        chrid[name + '-rev'] = curid + 1
        chrlen.append(lenn)
        chrlen.append(lenn)
        output_json['chromosomes'].append({'id': curid, 'name': name,
                'len': lenn})
        output_json['chromosomes'].append({'id': curid + 1,
                'name': name + '-rev', 'len': lenn})
        curid += 2

Hi,

thanks for the issue and the interest in SGTK. Can you tell which version of SGTK are you using?

Best, Olga

olga24912 avatar Oct 27 '20 14:10 olga24912

I installed it through conda: conda install -c olga24912 -c conda-forge -c bioconda sgtk. So the conda environment contains:

channels:
  - olga24912
  - bioconda
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=1_gnu
  - biopython=1.78=py36h8c4c3a4_1
  - boost=1.74.0=py36h79e6602_1
  - boost-cpp=1.74.0=h9359b55_0
  - bzip2=1.0.8=h516909a_3
  - ca-certificates=2020.6.20=hecda079_0
  - certifi=2020.6.20=py36h9880bd3_2
  - icu=67.1=he1b5a44_0
  - k8=0.2.5=he513fc3_0
  - ld_impl_linux-64=2.35=h769bd43_9
  - libblas=3.9.0=2_openblas
  - libcblas=3.9.0=2_openblas
  - libffi=3.2.1=he1b5a44_1007
  - libgcc-ng=9.3.0=h5dbcf3e_17
  - libgfortran-ng=9.3.0=he4bcb1c_17
  - libgfortran5=9.3.0=he4bcb1c_17
  - libgomp=9.3.0=h5dbcf3e_17
  - liblapack=3.9.0=2_openblas
  - libopenblas=0.3.12=pthreads_h4812303_1
  - libstdcxx-ng=9.3.0=h2ae2ef3_17
  - lz4-c=1.9.2=he1b5a44_3
  - minimap2=2.17=hed695b0_3
  - ncurses=6.2=he1b5a44_2
  - numpy=1.19.2=py36h68c22af_1
  - openssl=1.1.1h=h516909a_0
  - pip=20.2.4=py_0
  - python=3.6.11=h4d41432_2_cpython
  - python_abi=3.6=1_cp36m
  - readline=8.0=he28a2e2_2
  - seqan-library=2.4.0=0
  - setuptools=49.6.0=py36h9880bd3_2
  - sgtk=1.4.1=py36h6bb024c_0
  - sqlite=3.33.0=h4cf870e_1
  - star=2.7.6a=0
  - tk=8.6.10=hed695b0_1
  - wheel=0.35.1=pyh9f0ad1d_0
  - xz=5.2.5=h516909a_1
  - zlib=1.2.11=h516909a_1010
  - zstd=1.4.5=h6597ccf_2

Conda contain not the latest version right now :( And which OS are you using? In the case of Linux I can update it, with MAC OS it will take more time.

olga24912 avatar Oct 27 '20 15:10 olga24912

Oh sorry, i did not check the latest version number. I was too confident on the conda env! I am working under Linux RedHatEnterpriseServer 7.2. I can try to compile from scratch either.

ok don't worry, I took the latest binaries, apparently worked like a charm. Thanks!