Results 3 issues of iacobo

Added logic to automatically update links for all episodes, by using video id of top search result for episode title on YouTube. Also tidied up the pipeline using a DataFrame.

As seen [here](https://github.com/scikit-learn/scikit-learn/issues/8588), `sklearn.datasets.fetch_mldata` has been deprecated as of version 0.20 and will be removed from sklearn in version 0.22 since **the source website mldata.org went down as of March...

If one sequence has e.g. only CGT the old code will inconsistently assign integers to each letter if other sequences have a different set of letters (i.e. ACG or ACGT)....