audio icon indicating copy to clipboard operation
audio copied to clipboard

Torchaudio for aarch64?

Open praesc opened this issue 4 years ago • 12 comments

🚀 Feature

Is there a realease for aarch64?

Motivation

It'd be great to be able to deploy a pytorch module on a RPI for speech recognition

Pitch

Alternatives

Additional context

praesc avatar May 04 '20 10:05 praesc

Since pytorch doesn't officially support aarch64 (see here), we also don't officially. That being said, if you do succeed in making it work. Please feel free to details the step here :)

vincentqb avatar May 05 '20 01:05 vincentqb

thanks for the quick reply vincent. I tried to build the last version from source and got some issues while compiling on the Jetson Nano. However, version 0.3.0 worked out smoothly without out-of-the-box. Nvidia provides some pre-compiled wheels for their platforms, so it's straight forward to install pytorch on them. Hence, would be nice to have torchaudio as well

praesc avatar May 05 '20 08:05 praesc

Would also love to see it running on aarch64 since torch is now running there already. #657 ive also already commented, seems like the installation of the prequisites is the issue?

Installing via pyton3 setup.py install ran trough fine

Processing dependencies for torchaudio==0.6.0a0+313f4f5 Searching for torch==1.5.0 Best match: torch 1.5.0 Adding torch 1.5.0 to easy-install.pth file Installing convert-caffe2-to-onnx script to /usr/local/bin Installing convert-onnx-to-caffe2 script to /usr/local/bin

Using /home/ark626/.local/lib/python3.6/site-packages Searching for future==0.17.1 Best match: future 0.17.1 Adding future 0.17.1 to easy-install.pth file Installing futurize script to /usr/local/bin Installing pasteurize script to /usr/local/bin

Using /usr/local/lib/python3.6/dist-packages Searching for numpy==1.18.4 Best match: numpy 1.18.4 Adding numpy 1.18.4 to easy-install.pth file Installing f2py script to /usr/local/bin Installing f2py3 script to /usr/local/bin Installing f2py3.6 script to /usr/local/bin

Using /home/ark626/.local/lib/python3.6/site-packages Finished processing dependencies for torchaudio==0.6.0a0+313f4f5

But running ./packaging/build_from_source.sh $PW runs into issues.

ark626 avatar May 26 '20 00:05 ark626

Okay got it installed by working myself around.

=> Run the Script once until it fails Then alter the Parts like this (so libmad and lame are not overwritten)

#!/bin/bash

set -ex

# Arguments: PREFIX, specifying where to install dependencies into

PREFIX="$1"

#rm -rf /tmp/torchaudio-deps
#mkdir /tmp/torchaudio-deps
pushd /tmp/torchaudio-deps


# Curl Settings
CURL_OPTS="-L --retry 10 --connect-timeout 5 --max-time 180"

curl $CURL_OPTS -o sox-14.4.2.tar.bz2 "https://downloads.sourceforge.net/project/sox/sox/14.4.2/sox-14.4.2.tar.bz2"
#curl $CURL_OPTS -o lame-3.99.5.tar.gz "http://ftp.us.debian.org/debian/pool/main/l/lame/lame_3.99.5+repack1-9+b2_arm64.deb"
#curl $CURL_OPTS -o lame-3.99.5.tar.gz "https://downloads.sourceforge.net/project/lame/lame/3.99/lame-3.99.5.tar.gz"
curl $CURL_OPTS -o flac-1.3.2.tar.xz "https://downloads.sourceforge.net/project/flac/flac-src/flac-1.3.2.tar.xz"
#curl $CURL_OPTS -o libmad-0.15.1b.tar.gz  "https://launchpad.net/ubuntu/+archive/primary/+sourcefiles/libmad/0.15.1b-9ubuntu16.$
#"https://downloads.sourceforge.net/project/mad/libmad/0.15.1b/libmad-0.15.1b.tar.gz"

echo CurlDone
# unpack the dependencies
tar xfp sox-14.4.2.tar.bz2
#tar xfp lame-3.99.5.tar.gz
tar xfp flac-1.3.2.tar.xz
#tar xfp libmad-0.15.1b.tar.gz

Then replace those two config.guess files with this version: => https://svn.osgeo.org/grass/grass/tags/release_20150712_grass_6_4_5/config.guess

/tmp/torchaudio-deps/lame-3.99.5/config.guess /tmp/torchaudio-deps/libmad-0.15.1b/config.guess

Now run the script again, and it should be running trough fine.

Issue is that the config guess in the two libraries is from 2003 and super old. So it doesnt know aarch as Machine.

ark626 avatar May 26 '20 03:05 ark626

Hi @ark626 , I recently saw you are trying to run MelGAN-VC on a aarm64 device (Jetson Nano). I am trying as well, successfully compiled torchaudio but still can't manage to perform inference with said model. I'm wondering if you had any success with this and if you are willing to share your experience. Thanks in advance.

moih avatar Feb 06 '21 13:02 moih

Yes i fit ist running. I will answere later in more Detail. In the meantime you Coupe Check the Guide in this ive written

JetsonXavierAGX

ark626 avatar Feb 06 '21 13:02 ark626

Okay so in general the guide here referes to some of the usefull things ive used: https://github.com/ark626/JetsonXavierAGX

If i recall it properly to run MelGAN-VC you needed to install torch and a specific version of tensorflow (https://drive.google.com/drive/folders/1Ee9S9Ab892n_rONX4zqQdbjjt5rTnHwV?usp=sharing) Afterwards i needed to respect the following things:

Sometimes if Tensorflow is used with Pytorch, it can try to use all memory. This is of course bad news, because the RAM is a combined RAM for GPU and CPU. To prevent a overallocation for CUDA one can Limit the RAM CUDA uses. For TF2 this looks like this:

gpus = tf.config.experimental.list_physical_devices('GPU') tf.config.experimental.set_virtual_device_configuration(gpus[0], [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=CUDARAMINMB)])

CUDARAMINMB Is the CUDA RAM Limit in MB. The two lines above can be placed after the import of Tensorflow.

Also when using Tensorflow with Pytorch, there often is a weird error message about a block issue. This happens when Tensorflow is imported first. Make sure to import Tensorflow AFTER Pytorch, then it will work.

I didnt compile torchaudio i just used the torch installer as far as i recall, because the compiler failed every time.

ark626 avatar Feb 06 '21 13:02 ark626

I used:

torch @ file:///media/ext/dataSets/MelGANVC/torch-1.6.0rc2-cp36-cp36m-linux_aarch64.whl

Editable install with no version control (torchaudio==0.7.0a0+102174e)

-e /usr/local/lib/python3.6/dist-packages/torchaudio-0.7.0a0+102174e-py3.6-linux-aarch64.egg

The torch is linked in my guide.(https://drive.google.com/drive/folders/1Ee9S9Ab892n_rONX4zqQdbjjt5rTnHwV?usp=sharing)

For the torchaudio i sadly dont recall how i installed it exactly. But i will append the path where it is stored together with a archive of the folder.

Path is something like \SHODAN\pihome\usr\local\lib\python3.6\dist-packages The file which should be extracted there is: https://www.mediafire.com/file/9llme9a0ijtbu08/torchaudio-0.7.0a0+102174e-py3.6-linux-aarch64.egg.rar/file

=> I also can tell you, that MelGANVC doesnt use the whole torchaudio stuff, so in general even some half installations will work, because they only implemented the Algorithm to convert MelScales back to Audio in audiotorch. The rest is in tensorflow.

My imports look like this;

import matplotlib.pyplot as plt
import collections
from PIL import Image
from skimage.transform import resize
import imageio
import librosa
import librosa.display
from librosa.feature import melspectrogram
import os
import time
#import IPython
#import tensorflow as tf
#os.environ['LIBROSA_CACHE_DIR'] = 'C:/Users/ark-6/tmp'
os.environ['LIBROSA_CACHE_LEVEL'] = '50'
import wave
from glob import glob
import numpy as np
from pathlib import Path

import torch
import torch.nn as nn
import torch.nn.functional as F
from tqdm import tqdm
from functools import partial
import math
import heapq
from torchaudio.transforms import MelScale, Spectrogram

import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, Reshape, Flatten, Concatenate, Conv2D, Conv2DTranspose, GlobalAveragePooling2D, UpSampling2D, LeakyReLU, ReLU, Add, Multiply, Lambda, Dot, BatchNormalization, Activation, ZeroPadding2D, Cropping2D, Cropping1D
from tensorflow.keras.models import Sequential, Model, load_model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.initializers import TruncatedNormal, he_normal
import tensorflow.keras.backend as K

gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_virtual_device_configuration(gpus[0], [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=10024)])

If i can help you any further dont hesitate to ask, because i remember for my Thesis it was quite a hazzle to get this working.

ark626 avatar Feb 06 '21 13:02 ark626

Hi @ark626 ,

I've managed to get the generator part of MelGAN-VC working, thanks to your help. I'm running on a Jetson Nano Developer Kit 2GB, and I'm limiting my memory_limit=256

But now I am receiving this error, which is to do with Tensorflow it seems...Here is my log:

Built networks (196096,) (7, 512, 64, 1) Generating... 2021-02-06 16:32:03.058473: W tensorflow/stream_executor/gpu/redzone_allocator.cc:312] Not found: ./bin/ptxas not found Relying on driver to perform ptx compilation. This message will be only logged once. 2021-02-06 16:32:12.095268: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2,14GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2021-02-06 16:32:18.477486: E tensorflow/stream_executor/cuda/cuda_driver.cc:952] failed to synchronize the stop event: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated 2021-02-06 16:32:18.497364: E tensorflow/stream_executor/gpu/gpu_timer.cc:55] Internal: Error destroying CUDA event: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated 2021-02-06 16:32:18.497405: E tensorflow/stream_executor/gpu/gpu_timer.cc:60] Internal: Error destroying CUDA event: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated 2021-02-06 16:32:18.497597: I tensorflow/stream_executor/cuda/cuda_driver.cc:805] failed to allocate 8B (8 bytes) from device: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated 2021-02-06 16:32:18.497648: E tensorflow/stream_executor/stream.cc:5479] Internal: Failed to enqueue async memset operation: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated 2021-02-06 16:32:18.497757: E tensorflow/stream_executor/cuda/cuda_driver.cc:617] failed to load PTX text as a module: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated 2021-02-06 16:32:18.497799: E tensorflow/stream_executor/cuda/cuda_driver.cc:622] error log buffer (1024 bytes): 2021-02-06 16:32:18.506196: W tensorflow/core/kernels/gpu_utils.cc:69] Failed to check cudnn convolutions for out-of-bounds reads and writes with an error message: 'Failed to load PTX text as a module: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated'; skipping this check. This only means that we won't check cudnn for out-of-bounds reads and writes. This message will only be printed once. 2021-02-06 16:32:18.506314: I tensorflow/stream_executor/cuda/cuda_driver.cc:805] failed to allocate 8B (8 bytes) from device: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated 2021-02-06 16:32:18.545824: I tensorflow/stream_executor/stream.cc:4963] [stream=0x9418a1e0,impl=0x94189be0] did not memzero GPU location; source: 0x7ff39e55b8 2021-02-06 16:32:18.546030: E tensorflow/stream_executor/cuda/cuda_driver.cc:617] failed to load PTX text as a module: CUDA_ERROR_LAUNCH_TIMEOUT: the launch timed out and was terminated 2021-02-06 16:32:18.546073: E tensorflow/stream_executor/cuda/cuda_driver.cc:622] error log buffer (1024 bytes): Traceback (most recent call last): File "melgan_generator.py", line 740, in <module> abwv = towave(speca, name=output_name, path=output_directory) #Convert and save wav File "melgan_generator.py", line 704, in towave ab = gen(a, training=False) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/base_layer.py", line 822, in __call__ outputs = self.call(cast_inputs, *args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/network.py", line 717, in call convert_kwargs_to_constants=base_layer_utils.call_context().saving) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/network.py", line 891, in _run_internal_graph output_tensors = layer(computed_tensors, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/base_layer.py", line 822, in __call__ outputs = self.call(cast_inputs, *args, **kwargs) File "melgan_generator.py", line 404, in call dilation_rate=self.dilation_rate) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/backend.py", line 4954, in conv2d_transpose data_format=tf_data_format) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/nn_ops.py", line 2246, in conv2d_transpose name=name) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/nn_ops.py", line 2317, in conv2d_transpose_v2 name=name) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_nn_ops.py", line 1253, in conv2d_backprop_input _ops.raise_from_not_ok_status(e, name) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 6606, in raise_from_not_ok_status six.raise_from(core._status_to_exception(e.code, message), None) File "<string>", line 3, in raise_from tensorflow.python.framework.errors_impl.InternalError: DNN Backward Data function launch failure : input shape([7,512,64,1]) filter shape([512,1,1,512]) [Op:Conv2DBackpropInput] name: G/conv_s_n2d_transpose/conv2d_transpose/

Any idea what this might be? Thanks so much in advance.

moih avatar Feb 06 '21 15:02 moih

I think the issue here was the tf version. Try installing tensorflow==1.15.4+nv20.11 Also it seems that that the memory is a little bit to small and ran out of memory trying to allocate 2,14GiB.

You can try to make a smaller model by decreasing it somehow. (You can also try decrease the samples to make the allocation smaller.) You can do serveral runs with different models if you want to split it and reload the old saved model.

ark626 avatar Feb 06 '21 15:02 ark626

Thanks @ark626 , I will try that TF installation, also worth noting that I'm running JetPack V 4.5, whereas you are running version 4.4.x, would that make a difference?

One last thing is that I'm running the MelGAN-VC model with these hyperparams:


hop=512               #hop size (window size = 6*hop)
sr=44100              #sampling rate
min_level_db=-100     #reference values to normalize data
ref_level_db=20

shape=64              #length of time axis of split specrograms to feed to generator            
vec_len=128           #length of vector generated by siamese vector
bs = 4               #batch size
delta = 2.            #constant for siamese loss

At which sample rate and with what batch size did you manage to perform inference? Do you think running the model at 16khz sample rate help in fitting it into memory?

Thanks again for your help!

moih avatar Feb 06 '21 16:02 moih

to reduce the ram usage you can reduce the vec_len i.E. 64 or 32. The sampling rate sr should match the ones of your samples you use to train. Also i extracted the shape as parameter which can be reduced to 32 or 16. See the code below.

The sample rate will not help you in terms of ram. IT should fit your samples so verify that all your samples have the same sr The filters can be decreased untill 256 Batch size can be decreased until 2-3 so it will save ram.

```python from __future__ import print_function, division from glob import glob import scipy import soundfile as sf import matplotlib.pyplot as plt #from IPython.display import clear_output

import datetime import numpy as np import random import matplotlib.pyplot as plt import collections from PIL import Image from skimage.transform import resize import imageio import librosa import librosa.display from librosa.feature import melspectrogram import os import time #import IPython #import tensorflow as tf #os.environ['LIBROSA_CACHE_DIR'] = 'C:/Users/ark-6/tmp' os.environ['LIBROSA_CACHE_LEVEL'] = '50' import wave from glob import glob import numpy as np from pathlib import Path

import torch import torch.nn as nn import torch.nn.functional as F from tqdm import tqdm from functools import partial import math import heapq from torchaudio.transforms import MelScale, Spectrogram

import tensorflow as tf from tensorflow.keras.layers import Input, Dense, Reshape, Flatten, Concatenate, Conv2D, Conv2DTranspose, GlobalAveragePooling2D, UpSampling2D, LeakyReLU, ReLU, Add, Multiply, Lambda, Dot, BatchNormalization, Activation, ZeroPadding2D, Cropping2D, Cropping1D from tensorflow.keras.models import Sequential, Model, load_model from tensorflow.keras.optimizers import Adam from tensorflow.keras.initializers import TruncatedNormal, he_normal import tensorflow.keras.backend as K

gpus = tf.config.experimental.list_physical_devices('GPU') tf.config.experimental.set_virtual_device_configuration(gpus[0], [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=10024)])

def ensure_dir(path): Path(path).mkdir(parents=True, exist_ok=True)

def load_array(path,typ): ls = sorted(glob(f'{path}/'+str(typ))) adata = []

for i in range(len(ls)): #print(ls) x = ls[i] #print(x) adata.append(x) return np.array(adata)

def convertWaves(loadPath,savePath,saveName): infiles = load_array(loadPath,'*.wav')#["sound_1.wav", "sound_2.wav"] outfile = saveName

data= []
x=0
for infile in infiles:
    x = x+1
    print(x)
    w = wave.open(infile, 'rb')
    data.append( [w.getparams(), w.readframes(w.getnframes())] )
    w.close()

output = wave.open(str(savePath+'/'+outfile), 'wb')
output.setparams(data[0][0])
for i in range(x):
    output.writeframes(data[i][1])
output.close()

#Hyperparameters

hop=256 #hop size (window size = 6*hop) sr=16000 #sampling rate min_level_db=-100 #reference values to normalize data ref_level_db=20

filters = 1024

shape=48 #length of time axis of split specrograms to feed to generator
vec_len=128 #length of vector generated by siamese vector bs = 16 #batch size delta = 2. #constant for siamese loss epochs =5000 #Epochs to train

sampleFile = './dome/MelGANVC_2/Gruppe_A'#'./TestwavALL/dome'#'./devil.wav'

loadModel = False # Checks weather the model should load from a save or start new loadModelPath = './ai/load' #If loadModel is True this is the path where to load from finalResultPath = './ai/test' #Where should the final Generated Result of the Sample be saved n_saves = 2 # Save and Generate Test Sample after how many epochs

aPath = './dome/MelGANVC_2/Gruppe_A' # Source Domain wav Folder Path bPath = './dome/MelGANVC_2/Gruppe_B' #Target Domain wav Folder Path

#There seems to be a problem with Tensorflow STFT, so we'll be using pytorch to handle offline mel-spectrogram generation and waveform reconstruction #For waveform reconstruction, a gradient-based method is used:

#ORIGINAL CODE FROM https://github.com/yoyololicon/spectrogram-inversion

torch.set_default_tensor_type('torch.cuda.FloatTensor') torch.backends.cudnn.deterministic = False torch.backends.cudnn.benchmark = False

specobj = Spectrogram(n_fft=6hop, win_length=6hop, hop_length=hop, pad=0, power=2, normalized=True) #specobj = librosa.feature.melspectrogram(y=None, sr=22050, S=None, n_fft=6*hop, hop_length=hop, power=2.0, **kwargs) specfunc = specobj.forward melobj = MelScale(n_mels=hop, sample_rate=sr, f_min=0.) #melobj = librosa.filters.mel(sr=sr, n_mels=hop,fmin=0.) melfunc = melobj.forward

def melspecfunc(waveform): specgram = specfunc(waveform) mel_specgram = melfunc(specgram) return mel_specgram

def spectral_convergence(input, target): return 20 * ((input - target).norm().log10() - target.norm().log10())

def GRAD(spec, transform_fn, samples=None, init_x0=None, maxiter=1000, tol=1e-6, verbose=1, evaiter=10, lr=0.003):

spec = torch.Tensor(spec)
samples = (spec.shape[-1]*hop)-hop

if init_x0 is None:
    init_x0 = spec.new_empty((1,samples)).normal_(std=1e-6)
x = nn.Parameter(init_x0)
T = spec

criterion = nn.L1Loss()
optimizer = torch.optim.Adam([x], lr=lr)

bar_dict = {}
metric_func = spectral_convergence
bar_dict['spectral_convergence'] = 0
metric = 'spectral_convergence'

init_loss = None
with tqdm(total=maxiter, disable=not verbose) as pbar:
    for i in range(maxiter):
        optimizer.zero_grad()
        V = transform_fn(x)
        loss = criterion(V, T)
        loss.backward()
        optimizer.step()
        lr = lr*0.9999
        for param_group in optimizer.param_groups:
          param_group['lr'] = lr

        if i % evaiter == evaiter - 1:
            with torch.no_grad():
                V = transform_fn(x)
                bar_dict[metric] = metric_func(V, spec).item()
                l2_loss = criterion(V, spec).item()
                pbar.set_postfix(**bar_dict, loss=l2_loss)
                pbar.update(evaiter)

return x.detach().view(-1).cpu()

def normalize(S): return np.clip((((S - min_level_db) / -min_level_db)*2.)-1., -1, 1)

def denormalize(S): return (((np.clip(S, -1, 1)+1.)/2.) * -min_level_db) + min_level_db

def prep(wv,hop=192): S = np.array(torch.squeeze(melspecfunc(torch.Tensor(wv).view(1,-1))).detach().cpu()) S = librosa.power_to_db(S)-ref_level_db return normalize(S)

def deprep(S): S = denormalize(S)+ref_level_db S = librosa.db_to_power(S) wv = GRAD(np.expand_dims(S,0), melspecfunc, maxiter=2000, evaiter=10, tol=1e-8) return np.array(np.squeeze(wv))

#Helper functions

#Generate spectrograms from waveform array def tospec(data): specs=np.empty(data.shape[0], dtype=object) for i in range(data.shape[0]): x = data[i] S=prep(x) S = np.array(S, dtype=np.float32) specs[i]=np.expand_dims(S, -1) print(specs.shape) return specs

#Generate multiple spectrograms with a determined length from single wav file def tospeclong(path, length=416000): x, sr = librosa.load(path,sr=16000) x,_ = librosa.effects.trim(x) loudls = librosa.effects.split(x, top_db=50) xls = np.array([]) for interv in loudls: xls = np.concatenate((xls,x[interv[0]:interv[1]])) x = xls num = x.shape[0]//length specs=np.empty(num, dtype=object) for i in range(num-1): a = x[ilength:(i+1)*length] S = prep(a) S = np.array(S, dtype=np.float32) try: sh = S.shape specs[i]=S except AttributeError: print('spectrogram failed') print(specs.shape) return specs

#Waveform array from path of folder containing wav files def audio_array(path): ls = glob(f'{path}/*.wav') adata = []

for i in range(len(ls)): #print(ls) x, sr = tf.audio.decode_wav(tf.io.read_file(ls[i]), 1) #print(x) x = np.array(x).astype(dtype=np.float32) adata.append(x) return np.array(adata)

#Concatenate spectrograms in array along the time axis def testass(a): but=False con = np.array([]) nim = a.shape[0] for i in range(nim): im = a[i] im = np.squeeze(im) if not but: con=im but=True else: con = np.concatenate((con,im), axis=1) return np.squeeze(con)

#Split spectrograms in chunks with equal size def splitcut(data): ls = [] mini = 0 minifinal = 10shape #max spectrogram length for i in range(data.shape[0]-1): if data[i].shape[1]<=data[i+1].shape[1]: mini = data[i].shape[1] else: mini = data[i+1].shape[1] if mini>=3shape and mini<minifinal: minifinal = mini for i in range(data.shape[0]): x = data[i] if x.shape[1]>=3shape: for n in range(x.shape[1]//minifinal): ls.append(x[:,nminifinal:n*minifinal+minifinal,:]) ls.append(x[:,-minifinal:,:]) return np.array(ls)

#Generating Mel-Spectrogram dataset (Uncomment where needed) #adata: source spectrograms #bdata: target spectrograms

#MALE1 #awv = audio_array('../content/cmu_us_clb_arctic/wav') #get waveform array from folder containing wav files #aspec = tospec(awv) #get spectrogram array #adata = splitcut(aspec) #split spectrogams to fixed length #FEMALE1 #bwv = audio_array('../content/cmu_us_bdl_arctic/wav') #bspec = tospec(bwv) #bdata = splitcut(bspec)

#MALE2

awv = audio_array('../content/cmu_us_rms_arctic/wav')

aspec = tospec(awv)

adata = splitcut(aspec)

#FEMALE2

bwv = audio_array('../content/cmu_us_slt_arctic/wav')

bspec = tospec(bwv)

bdata = splitcut(bspec)

#JAZZ MUSIC awv = audio_array(aPath) aspec = tospec(awv) adata = splitcut(aspec) #CLASSICAL MUSIC bwv = audio_array(bPath) bspec = tospec(bwv) bdata = splitcut(bspec)

#Creating Tensorflow Datasets

def proc(x): return tf.image.random_crop(x, size=[hop, 3*shape, 1])

dsa = tf.data.Dataset.from_tensor_slices(adata).repeat(50).map(proc, num_parallel_calls=tf.data.experimental.AUTOTUNE).shuffle(10000).batch(bs, drop_remainder=True) dsb = tf.data.Dataset.from_tensor_slices(bdata).repeat(50).map(proc, num_parallel_calls=tf.data.experimental.AUTOTUNE).shuffle(10000).batch(bs, drop_remainder=True)

#Adding Spectral Normalization to convolutional layers

from tensorflow.python.keras.utils import conv_utils from tensorflow.python.ops import array_ops from tensorflow.python.ops import math_ops from tensorflow.python.ops import sparse_ops from tensorflow.python.ops import gen_math_ops from tensorflow.python.ops import standard_ops from tensorflow.python.eager import context from tensorflow.python.framework import tensor_shape

def l2normalize(v, eps=1e-12): return v / (tf.norm(v) + eps)

class ConvSN2D(tf.keras.layers.Conv2D):

def __init__(self, filters, kernel_size, power_iterations=1, **kwargs):
    super(ConvSN2D, self).__init__(filters, kernel_size, **kwargs)
    self.power_iterations = power_iterations


def build(self, input_shape):
    super(ConvSN2D, self).build(input_shape)

    if self.data_format == 'channels_first':
        channel_axis = 1
    else:
        channel_axis = -1

    self.u = self.add_weight(self.name + '_u',
        shape=tuple([1, self.kernel.shape.as_list()[-1]]), 
        initializer=tf.initializers.RandomNormal(0, 1),
        trainable=False
    )

def compute_spectral_norm(self, W, new_u, W_shape):
    for _ in range(self.power_iterations):

        new_v = l2normalize(tf.matmul(new_u, tf.transpose(W)))
        new_u = l2normalize(tf.matmul(new_v, W))
        
    sigma = tf.matmul(tf.matmul(new_v, W), tf.transpose(new_u))
    W_bar = W/sigma

    with tf.control_dependencies([self.u.assign(new_u)]):
      W_bar = tf.reshape(W_bar, W_shape)

    return W_bar


def call(self, inputs):
    W_shape = self.kernel.shape.as_list()
    W_reshaped = tf.reshape(self.kernel, (-1, W_shape[-1]))
    new_kernel = self.compute_spectral_norm(W_reshaped, self.u, W_shape)
    outputs = self._convolution_op(inputs, new_kernel)

    if self.use_bias:
        if self.data_format == 'channels_first':
                outputs = tf.nn.bias_add(outputs, self.bias, data_format='NCHW')
        else:
            outputs = tf.nn.bias_add(outputs, self.bias, data_format='NHWC')
    if self.activation is not None:
        return self.activation(outputs)

    return outputs

class ConvSN2DTranspose(tf.keras.layers.Conv2DTranspose):

def __init__(self, filters, kernel_size, power_iterations=1, **kwargs):
    super(ConvSN2DTranspose, self).__init__(filters, kernel_size, **kwargs)
    self.power_iterations = power_iterations


def build(self, input_shape):
    super(ConvSN2DTranspose, self).build(input_shape)

    if self.data_format == 'channels_first':
        channel_axis = 1
    else:
        channel_axis = -1

    self.u = self.add_weight(self.name + '_u',
        shape=tuple([1, self.kernel.shape.as_list()[-1]]), 
        initializer=tf.initializers.RandomNormal(0, 1),
        trainable=False
    )

def compute_spectral_norm(self, W, new_u, W_shape):
    for _ in range(self.power_iterations):

        new_v = l2normalize(tf.matmul(new_u, tf.transpose(W)))
        new_u = l2normalize(tf.matmul(new_v, W))
        
    sigma = tf.matmul(tf.matmul(new_v, W), tf.transpose(new_u))
    W_bar = W/sigma

    with tf.control_dependencies([self.u.assign(new_u)]):
      W_bar = tf.reshape(W_bar, W_shape)

    return W_bar

def call(self, inputs):
    W_shape = self.kernel.shape.as_list()
    W_reshaped = tf.reshape(self.kernel, (-1, W_shape[-1]))
    new_kernel = self.compute_spectral_norm(W_reshaped, self.u, W_shape)

    inputs_shape = array_ops.shape(inputs)
    batch_size = inputs_shape[0]
    if self.data_format == 'channels_first':
      h_axis, w_axis = 2, 3
    else:
      h_axis, w_axis = 1, 2

    height, width = inputs_shape[h_axis], inputs_shape[w_axis]
    kernel_h, kernel_w = self.kernel_size
    stride_h, stride_w = self.strides

    if self.output_padding is None:
      out_pad_h = out_pad_w = None
    else:
      out_pad_h, out_pad_w = self.output_padding

    out_height = conv_utils.deconv_output_length(height,
                                                kernel_h,
                                                padding=self.padding,
                                                output_padding=out_pad_h,
                                                stride=stride_h,
                                                dilation=self.dilation_rate[0])
    out_width = conv_utils.deconv_output_length(width,
                                                kernel_w,
                                                padding=self.padding,
                                                output_padding=out_pad_w,
                                                stride=stride_w,
                                                dilation=self.dilation_rate[1])
    if self.data_format == 'channels_first':
      output_shape = (batch_size, self.filters, out_height, out_width)
    else:
      output_shape = (batch_size, out_height, out_width, self.filters)

    output_shape_tensor = array_ops.stack(output_shape)
    outputs = K.conv2d_transpose(
        inputs,
        new_kernel,
        output_shape_tensor,
        strides=self.strides,
        padding=self.padding,
        data_format=self.data_format,
        dilation_rate=self.dilation_rate)

    if not context.executing_eagerly():
      out_shape = self.compute_output_shape(inputs.shape)
      outputs.set_shape(out_shape)

    if self.use_bias:
      outputs = tf.nn.bias_add(
          outputs,
          self.bias,
          data_format=conv_utils.convert_data_format(self.data_format, ndim=4))

    if self.activation is not None:
      return self.activation(outputs)
    return outputs  

class DenseSN(Dense): def build(self, input_shape): super(DenseSN, self).build(input_shape)

    self.u = self.add_weight(self.name + '_u',
        shape=tuple([1, self.kernel.shape.as_list()[-1]]), 
        initializer=tf.initializers.RandomNormal(0, 1),
        trainable=False)
    
def compute_spectral_norm(self, W, new_u, W_shape):
    new_v = l2normalize(tf.matmul(new_u, tf.transpose(W)))
    new_u = l2normalize(tf.matmul(new_v, W))
    sigma = tf.matmul(tf.matmul(new_v, W), tf.transpose(new_u))
    W_bar = W/sigma
    with tf.control_dependencies([self.u.assign(new_u)]):
      W_bar = tf.reshape(W_bar, W_shape)
    return W_bar
    
def call(self, inputs):
    W_shape = self.kernel.shape.as_list()
    W_reshaped = tf.reshape(self.kernel, (-1, W_shape[-1]))
    new_kernel = self.compute_spectral_norm(W_reshaped, self.u, W_shape)
    rank = len(inputs.shape)
    if rank > 2:
      outputs = standard_ops.tensordot(inputs, new_kernel, [[rank - 1], [0]])
      if not context.executing_eagerly():
        shape = inputs.shape.as_list()
        output_shape = shape[:-1] + [self.units]
        outputs.set_shape(output_shape)
    else:
      inputs = math_ops.cast(inputs, self._compute_dtype)
      if K.is_sparse(inputs):
        outputs = sparse_ops.sparse_tensor_dense_matmul(inputs, new_kernel)
      else:
        outputs = gen_math_ops.mat_mul(inputs, new_kernel)
    if self.use_bias:
      outputs = tf.nn.bias_add(outputs, self.bias)
    if self.activation is not None:
      return self.activation(outputs)
    return outputs

#Networks Architecture

init = tf.keras.initializers.he_uniform()

def conv2d(layer_input, filters, kernel_size=4, strides=2, padding='same', leaky=True, bnorm=True, sn=True): if leaky: Activ = LeakyReLU(alpha=0.2) else: Activ = ReLU() if sn: d = ConvSN2D(filters, kernel_size=kernel_size, strides=strides, padding=padding, kernel_initializer=init, use_bias=False)(layer_input) else: d = Conv2D(filters, kernel_size=kernel_size, strides=strides, padding=padding, kernel_initializer=init, use_bias=False)(layer_input) if bnorm: d = BatchNormalization()(d) d = Activ(d) return d

def deconv2d(layer_input, layer_res, filters, kernel_size=4, conc=True, scalev=False, bnorm=True, up=True, padding='same', strides=2): if up: u = UpSampling2D((1,2))(layer_input) u = ConvSN2D(filters, kernel_size, strides=(1,1), kernel_initializer=init, use_bias=False, padding=padding)(u) else: u = ConvSN2DTranspose(filters, kernel_size, strides=strides, kernel_initializer=init, use_bias=False, padding=padding)(layer_input) if bnorm: u = BatchNormalization()(u) u = LeakyReLU(alpha=0.2)(u) if conc: u = Concatenate()([u,layer_res]) return u

#Extract function: splitting spectrograms def extract_image(im): im1 = Cropping2D(((0,0), (0, 2*(im.shape[2]//3))))(im) im2 = Cropping2D(((0,0), (im.shape[2]//3,im.shape[2]//3)))(im) im3 = Cropping2D(((0,0), (2*(im.shape[2]//3), 0)))(im) return im1,im2,im3

#Assemble function: concatenating spectrograms def assemble_image(lsim): im1,im2,im3 = lsim imh = Concatenate(2)([im1,im2,im3]) return imh

#U-NET style architecture def build_generator(input_shape): h,w,c = input_shape inp = Input(shape=input_shape) #downscaling g0 = tf.keras.layers.ZeroPadding2D((0,1))(inp) g1 = conv2d(g0, filters, kernel_size=(h,3), strides=1, padding='valid') g2 = conv2d(g1, filters, kernel_size=(1,9), strides=(1,2)) g3 = conv2d(g2, filters, kernel_size=(1,7), strides=(1,2)) #g4 = conv2d(g3, 128, kernel_size=(1,2), strides=(1,2)) #upscaling #g5 = deconv2d(g4,g3, 128, kernel_size=(1,2), strides=(1,2)) g4 = deconv2d(g3,g2, filters, kernel_size=(1,7), strides=(1,2)) g5 = deconv2d(g4,g1, filters, kernel_size=(1,9), strides=(1,2), bnorm=False) g6 = ConvSN2DTranspose(1, kernel_size=(h,1), strides=(1,1), kernel_initializer=init, padding='valid', activation='tanh')(g5) return Model(inp,g6, name='G')

#Siamese Network def build_siamese(input_shape): h,w,c = input_shape inp = Input(shape=input_shape) g1 = conv2d(inp, filters, kernel_size=(h,3), strides=1, padding='valid', sn=False) g2 = conv2d(g1, filters, kernel_size=(1,9), strides=(1,2), sn=False) g3 = conv2d(g2, filters, kernel_size=(1,7), strides=(1,2), sn=False) #g4 = conv2d(g3, 128, kernel_size=(1,2), strides=(1,2), sn=False) g4 = Flatten()(g3) g5 = Dense(vec_len)(g4) return Model(inp, g5, name='S')

#Discriminator (Critic) Network def build_critic(input_shape): h,w,c = input_shape inp = Input(shape=input_shape) g1 = conv2d(inp, filters2, kernel_size=(h,3), strides=1, padding='valid', bnorm=False) g2 = conv2d(g1, filters2, kernel_size=(1,9), strides=(1,2), bnorm=False) g3 = conv2d(g2, filters*2, kernel_size=(1,7), strides=(1,2), bnorm=False) #g4 = conv2d(g3, 128, kernel_size=(1,2), strides=(1,2), bnorm=False) g5 = Flatten()(g3) g6 = DenseSN(1, kernel_initializer=init)(g5) return Model(inp, g6, name='C')

#Load past models from path to resume training or test def load(path): gen = build_generator((hop,shape,1)) siam = build_siamese((hop,shape,1)) critic = build_critic((hop,3*shape,1)) gen.load_weights(path+'/gen.h5') critic.load_weights(path+'/critic.h5') siam.load_weights(path+'/siam.h5') return gen,critic,siam

#Build models def build(): gen = build_generator((hop,shape,1)) siam = build_siamese((hop,shape,1)) critic = build_critic((hop,3*shape,1)) #the discriminator accepts as input spectrograms of triple the width of those generated by the generator return gen,critic,siam

#Generate a random batch to display current training results def testgena(): sw = True while sw: a = np.random.choice(aspec) if a.shape[1]//shape!=1: sw=False dsa = [] if a.shape[1]//shape>6: num=6 else: num=a.shape[1]//shape rn = np.random.randint(a.shape[1]-(numshape)) for i in range(num): im = a[:,rn+(ishape):rn+(i*shape)+shape] im = np.reshape(im, (im.shape[0],im.shape[1],1)) dsa.append(im) return np.array(dsa, dtype=np.float32)

#Show results mid-training

return abwv

def save_test_image_full(path,sample1=sampleFile): #speca = tospeclong(sample1) #specb = tospeclong(sample2) awv = audio_array(sample1) aspec = tospec(awv) adata = splitcut(aspec) i = 0 for x in adata: print('Generating Samples'+str(i)) i = i+1 towave2(x, i,name=str('File1'),path=path) name = str('File1') loadPath1 = f'{path}/{name}' convertWaves(loadPath1,path,str('File1.wav')) #plt.show()

def save_test_image_fullOriginal(path): a = testgena() print(a.shape) ab = gen(a, training=False) ab = testass(ab) a = testass(a) abwv = deprep(ab) awv = deprep(a) sf.write(path+'/new_file.wav', abwv, sr)

IPython.display.display(IPython.display.Audio(np.squeeze(abwv), rate=sr))

IPython.display.display(IPython.display.Audio(np.squeeze(awv), rate=sr))

fig, axs = plt.subplots(ncols=2) axs[0].imshow(np.flip(a, -2), cmap=None) axs[0].axis('off') axs[0].set_title('Source') axs[1].imshow(np.flip(ab, -2), cmap=None) axs[1].axis('off') axs[1].set_title('Generated') #plt.show()

#Save in training loop def save_end(epoch,gloss,closs,mloss,n_save=3,save_path='./ai/'): #use custom save_path (i.e. Drive '../content/drive/My Drive/') if epoch % n_save == 0: print('Saving...') path = f'{save_path}/MELGANVC-{str(epoch)}-{str(gloss)[:9]}-{str(closs)[:9]}-{str(mloss)[:9]}' ensure_dir(path) gen.save_weights(path+'/gen.h5') critic.save_weights(path+'/critic.h5') siam.save_weights(path+'/siam.h5') save_test_image_full(path)

#Losses

def mae(x,y): return tf.reduce_mean(tf.abs(x-y))

def mse(x,y): return tf.reduce_mean((x-y)**2)

def loss_travel(sa,sab,sa1,sab1): l1 = tf.reduce_mean(((sa-sa1) - (sab-sab1))**2) l2 = tf.reduce_mean(tf.reduce_sum(-(tf.nn.l2_normalize(sa-sa1, axis=[-1]) * tf.nn.l2_normalize(sab-sab1, axis=[-1])), axis=-1)) return l1+l2

def loss_siamese(sa,sa1): logits = tf.sqrt(tf.reduce_sum((sa-sa1)**2, axis=-1, keepdims=True)) return tf.reduce_mean(tf.square(tf.maximum((delta - logits), 0)))

def d_loss_f(fake): return tf.reduce_mean(tf.maximum(1 + fake, 0))

def d_loss_r(real): return tf.reduce_mean(tf.maximum(1 - real, 0))

def g_loss_f(fake): return tf.reduce_mean(- fake)

#Get models and optimizers def get_networks(shape, load_model=False, path=None): if not load_model: gen,critic,siam = build() else: gen,critic,siam = load(path) print('Built networks')

opt_gen = Adam(0.0001, 0.5) opt_disc = Adam(0.0001, 0.5)

return gen,critic,siam, [opt_gen,opt_disc]

#Set learning rate def update_lr(lr): opt_gen.learning_rate = lr opt_disc.learning_rate = lr

#Training Functions

#Train Generator, Siamese and Critic

def train_all(a,b): #splitting spectrogram in 3 parts aa,aa2,aa3 = extract_image(a) bb,bb2,bb3 = extract_image(b)

with tf.GradientTape() as tape_gen, tf.GradientTape() as tape_disc:

#translating A to B
fab = gen(aa, training=True)
fab2 = gen(aa2, training=True)
fab3 = gen(aa3, training=True)
#identity mapping B to B                                                        COMMENT THESE 3 LINES IF THE IDENTITY LOSS TERM IS NOT NEEDED
fid = gen(bb, training=True) 
fid2 = gen(bb2, training=True)
fid3 = gen(bb3, training=True)
#concatenate/assemble converted spectrograms
fabtot = assemble_image([fab,fab2,fab3])

#feed concatenated spectrograms to critic
cab = critic(fabtot, training=True)
cb = critic(b, training=True)
#feed 2 pairs (A,G(A)) extracted spectrograms to Siamese
sab = siam(fab, training=True)
sab2 = siam(fab3, training=True)
sa = siam(aa, training=True)
sa2 = siam(aa3, training=True)

#identity mapping loss
loss_id = (mae(bb,fid)+mae(bb2,fid2)+mae(bb3,fid3))/3.                         #loss_id = 0. IF THE IDENTITY LOSS TERM IS NOT NEEDED
#travel loss
loss_m = loss_travel(sa,sab,sa2,sab2)+loss_siamese(sa,sa2)
#generator and critic losses
loss_g = g_loss_f(cab)
loss_dr = d_loss_r(cb)
loss_df = d_loss_f(cab)
loss_d = (loss_dr+loss_df)/2.
#generator+siamese total loss
lossgtot = loss_g+10.*loss_m+0.5*loss_id                                       #CHANGE LOSS WEIGHTS HERE  (COMMENT OUT +w*loss_id IF THE IDENTITY LOSS TERM IS NOT NEEDED)

#computing and applying gradients grad_gen = tape_gen.gradient(lossgtot, gen.trainable_variables+siam.trainable_variables) opt_gen.apply_gradients(zip(grad_gen, gen.trainable_variables+siam.trainable_variables))

grad_disc = tape_disc.gradient(loss_d, critic.trainable_variables) opt_disc.apply_gradients(zip(grad_disc, critic.trainable_variables))

return loss_dr,loss_df,loss_g,loss_id

#Train Critic only def train_d(a,b): aa,aa2,aa3 = extract_image(a) with tf.GradientTape() as tape_disc:

fab = gen(aa, training=True)
fab2 = gen(aa2, training=True)
fab3 = gen(aa3, training=True)
fabtot = assemble_image([fab,fab2,fab3])

cab = critic(fabtot, training=True)
cb = critic(b, training=True)

loss_dr = d_loss_r(cb)
loss_df = d_loss_f(cab)

loss_d = (loss_dr+loss_df)/2.

grad_disc = tape_disc.gradient(loss_d, critic.trainable_variables) opt_disc.apply_gradients(zip(grad_disc, critic.trainable_variables))

return loss_dr,loss_df

#Assembling generated Spectrogram chunks into final Spectrogram def specass(a,spec): but=False con = np.array([]) nim = a.shape[0] for i in range(nim-1): im = a[i] im = np.squeeze(im) if not but: con=im but=True else: con = np.concatenate((con,im), axis=1) diff = spec.shape[1]-(nim*shape) a = np.squeeze(a) con = np.concatenate((con,a[-1,:,-diff:]), axis=1) return np.squeeze(con)

#Splitting input spectrogram into different chunks to feed to the generator def chopspec(spec): dsa=[] for i in range(spec.shape[1]//shape): im = spec[:,ishape:ishape+shape] im = np.reshape(im, (im.shape[0],im.shape[1],1)) dsa.append(im) imlast = spec[:,-shape:] imlast = np.reshape(imlast, (imlast.shape[0],imlast.shape[1],1)) dsa.append(imlast) return np.array(dsa, dtype=np.float32)

#Converting from source Spectrogram to target Spectrogram

def towave(spec, name, path='./ai/', show=False): specarr = chopspec(spec) print("ToWav") print(specarr.shape) a = specarr print('Generating...') ab = gen(a, training=False) print('Assembling and Converting...') a = specass(a,spec) ab = specass(ab,spec) awv = deprep(a) abwv = deprep(ab) print('Saving...') pathfin = f'{path}/{name}' ensure_dir(pathfin) sf.write(pathfin+'/AB.wav', abwv, sr) sf.write(pathfin+'/A.wav', awv, sr) print('Saved WAV!') #IPython.display.display(IPython.display.Audio(np.squeeze(abwv), rate=sr)) #IPython.display.display(IPython.display.Audio(np.squeeze(awv), rate=sr)) if show: fig, axs = plt.subplots(ncols=2) axs[0].imshow(np.flip(a, -2), cmap=None) axs[0].axis('off') axs[0].set_title('Source') axs[1].imshow(np.flip(ab, -2), cmap=None) axs[1].axis('off') axs[1].set_title('Generated') #plt.show() return abwv

Fix for long files

def towave2(spec,i, name, path='./ai/',show=False): if spec is None: return specarr = chopspec(spec) print("ToWav") print(specarr.shape) a = specarr print('Generating...') ab = gen(a, training=False) print('Assembling and Converting...') #a = specass(a,spec) ab = specass(ab,spec) #awv = deprep(a) abwv = deprep(ab) print('Saving to Path '+str(path)+ ' ...') pathfin = f'{path}/{name}'

print(pathfin)

ensure_dir(pathfin) if(i<100): if(i<10): sf.write(pathfin+'/AB00'+str(i)+'.wav', abwv, sr) else: sf.write(pathfin+'/AB0'+str(i)+'.wav', abwv, sr) #sf.write(pathfin+'/A'+str(i)+'.wav', awv, sr) else: sf.write(pathfin+'/AB'+str(i)+'.wav',abwv,sr) print('Saved WAV!') #IPython.display.display(IPython.display.Audio(np.squeeze(abwv), rate=sr)) #IPython.display.display(IPython.display.Audio(np.squeeze(awv), rate=sr)) if show: fig, axs = plt.subplots(ncols=2) axs[0].imshow(np.flip(a, -2), cmap=None) axs[0].axis('off') axs[0].set_title('Source') axs[1].imshow(np.flip(ab, -2), cmap=None) axs[1].axis('off') axs[1].set_title('Generated') #plt.show() return abwv

#Training Loop

def train(epochs, batch_size=16, lr=0.0001, n_save=6, gupt=5):

update_lr(lr) df_list = [] dr_list = [] g_list = [] id_list = [] c = 0 g = 0

save_test_image_full('./')

for epoch in range(1,epochs,1): bef = time.time()

    for batchi,(a,b) in enumerate(zip(dsa,dsb)):
      
        if batchi%gupt==0:
          dloss_t,dloss_f,gloss,idloss = train_all(a,b)
        else:
          dloss_t,dloss_f = train_d(a,b)

        df_list.append(dloss_f)
        dr_list.append(dloss_t)
        g_list.append(gloss)
        id_list.append(idloss)
        c += 1
        g += 1

        if batchi%600==0:
            print(f'[Epoch {epoch}/{epochs}] [Batch {batchi}] [D loss f: {np.mean(df_list[-g:], axis=0)} ', end='')
            print(f'r: {np.mean(dr_list[-g:], axis=0)}] ', end='')
            print(f'[G loss: {np.mean(g_list[-g:], axis=0)}] ', end='')
            print(f'[ID loss: {np.mean(id_list[-g:])}] ', end='')
            print(f'[LR: {lr}]')
            g = 0
        nbatch=batchi

    print(f'Time/Batch {(time.time()-bef)/nbatch}')
    save_end(epoch,np.mean(g_list[-n_save*c:], axis=0),np.mean(df_list[-n_save*c:], axis=0),np.mean(id_list[-n_save*c:], axis=0),n_save=n_save)
    print(f'Mean D loss: {np.mean(df_list[-c:], axis=0)} Mean G loss: {np.mean(g_list[-c:], axis=0)} Mean ID loss: {np.mean(id_list[-c:], axis=0)}')
    c = 0

#Build models and initialize optimizers

#If load_model=True, specify the path where the models are saved

gen,critic,siam, [opt_gen,opt_disc] = get_networks(shape, load_model=loadModel, path=loadModelPath)

#Training

#n_save = how many epochs between each saving and displaying of results #gupt = how many discriminator updates for generator+siamese update

train(epochs, batch_size=bs, lr=0.0002, n_save=n_saves, gupt=3)

#After Training, use these functions to convert data with the generator and save the results

#Wav to wav conversion #librosa.util.example_audio_file() #wv, sr = librosa.load('./devil.wav', sr=16000) #Load waveform #print("Librosa Loaded Sample with Shape: "+str(wv.shape)) #Waveform to Spectrogram

#plt.figure(figsize=(50,1)) #Show Spectrogram #plt.imshow(np.flip(speca, axis=0), cmap=None) #plt.axis('off') #plt.show()

save_test_image_full(finalResultPath,sampleFile)


</details>

For the sample rate you can install sox add it to your path and create a bat with the content like below:
you can alter the sample rate the c = 1 is important so it is a mono sound.

for /R %%A in (*.wav) do if /i "%%~XA"==".wav" ( sox "%%A" -c 1 -r 20050 "%%A"DownSampled.wav )

ark626 avatar Feb 06 '21 17:02 ark626