DeLUCS icon indicating copy to clipboard operation
DeLUCS copied to clipboard

This repository contains all the source files required to run DeLUCS, a deep learning clustering algorithm for DNA sequences.

DeLUCS

This repository contains all the source files required to reproduce the results in the original DeLUCS paper (https://doi.org/10.1101/2021.05.13.444008), as well as a detailed guide for running the code.

Computational Pipeline:

1. Build the dataset:

	python build_dp.py --data_path=<PATH_sequence_folder>	
  • Input: Folders with the sequences in FASTA format
  • Output : file in the form (label,sequence,accession)

2. Compute the mimic sequences.

	python get_pairs.py --data_path=<PATH_pickle_dataset> --k=6 --modify='mutation' --output=<PATH_output_file> --n_mimics=<n mimics per sequence>
  • Input: file in the form (label,sequence,accession)
  • Output : file in the form of (pairs, x_test, y_test)

3. Train the model.

* For training DeLUCS and testing its performance
	```
	python EvaluateDeLUCS.py --data_dir=<PATH_of_computed_mimics> --out_dir=<OUTPURDIR>
	```

	* Input: Pickle file with the mimics in the form of (pairs, x_test, y_test). 
	* Output : Confusion Matrix. 
			<!--* File with the misclassified sequences in the form (accession, true_label, predicted_label)-->

* For testing the performance  a single Neural Network trained in an unsupervised way (labels must be available):
	```
	python EvaluateSingleRun.py --data_dir=<PATH_of_computed_mimics> --out_dir=<OUTPURDIR>
	```

Training on your own data

We recomend using the updated version of the code in (https://github.com/Kari-Genomics-Lab) for training on your own data.

Citation

If you find DeLUCS useful in your research please consider citing:

@article{10.1371/journal.pone.0261531,
    doi = {10.1371/journal.pone.0261531},
    author = {Millán Arias, Pablo AND Alipour, Fatemeh AND Hill, Kathleen A. AND Kari, Lila},
    journal = {PLOS ONE},
    publisher = {Public Library of Science},
    title = {DeLUCS: Deep learning for unsupervised clustering of DNA sequences},
    year = {2022},
    month = {01},
    volume = {17},
    url = {https://doi.org/10.1371/journal.pone.0261531},
    pages = {1-25},
    number = {1},
}