ProFET
ProFET copied to clipboard
trouble getting started
Thank you for the development of ProFET!
I wanted to try it out but I ran into some trouble. It would be great if you could point me towards where I am going wrong.
I am using python 3.4 and I have have installed all the dependencies mentioned in the README.md. I have the following folder structure where feat_extract is my working directory:
feat_extract/
|_pipeline.py
|_other ProFET files...
|_test_seq/...
|_train/
| |_A/
| | |_train_sequences_A.fasta
| |_B/
| |_train_sequences_B.fasta
|_test
|_A/
| |_test_sequences_A.fasta
|_B/
|_test_sequences_B.fasta
The fasta files were created with the following set of commands:
cd ./test_seq/Extracellular/
tail -n 1000 location-secreted_keyword-AKW-0964_reviewed_taxon-Tetrapoda_fragment-no_id-0.9.fasta > ../../train/A/train_sequences_A.fasta
tail -n 1000 NOT-secreted_NOT-extracellular_reviewed_taxon-Tetrapoda_fragment-no_id-0.5.fasta > ../../train/B/train_sequences_B.fasta
head -n 1000 location-secreted_keyword-AKW-0964_reviewed_taxon-Tetrapoda_fragment-no_id-0.9.fasta > ../../test/A/test_sequences_A.fasta
head -n 1000 NOT-secreted_NOT-extracellular_reviewed_taxon-Tetrapoda_fragment-no_id-0.5.fasta > ../../test/B/test_sequences_B.fasta
cd ../../
When running the command:
python pipeline.py --trainingSetDir ./train --testingSetDir ./test --trainFeatures True --testFeatures True --classType dir
I get the following error message:
<cProfile.Profile object at 0x107745db0>
Starting to extract features from training set
dirr change to: ./train
Multiclass fasta_files list found: []
Features generated
Removing any all zero features
df.shape: (0, 0)
df_cleaned shape: (0, 0)
Done
Extracted training data features
Training predictive model
Traceback (most recent call last):
File "pipeline.py", line 171, in <module>
res = profiler.runcall(pipeline)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/cProfile.py", line 109, in runcall
return func(*args, **kw)
File "pipeline.py", line 90, in pipeline
model, lb_encoder = trainClassifier(filename=trainingDir+'/trainingSetFeatures.csv',normFlag= False,classifierType= classifierType,kbest= 0,alpha= False,optimalFlag= False) #Win
File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 114, in trainClassifier
features, labels, lb_encoder,featureNames = load_data(filename, 'file')
File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 36, in load_data
df = pd.read_csv(dataFrame, index_col=[0,1]) # is index column 0 in multiindex as well?
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 474, in parser_f
return _read(filepath_or_buffer, kwds)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 250, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 566, in __init__
self._make_engine(self.engine)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 705, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 1072, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "pandas/parser.pyx", line 350, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:3173)
File "pandas/parser.pyx", line 594, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:5912)
OSError: File b'./train/trainingSetFeatures.csv' does not exist
It complains that ./train/trainingSetFeatures.csv' does not exist. I see that a file with this name is being created in the train folder, however it is a table with only column names (no rows).
Thank you for your help.
The problem is that no features are extracted. (Not sure why). Have you tried extracting features using the "file" vs "dir" option? I'll be uploading an update In the next few days. On Sep 9, 2015 4:08 PM, "cnjr2" [email protected] wrote:
Thank you for the development of ProFET!
I wanted to try it out but I ran into some trouble. It would be great if you could point me towards where I am going wrong.
I am using python 3.4 and I have have installed all the dependencies mentioned in the README.md. I have the following folder structure where feat_extract is my working directory:
feat_extract/ |_pipeline.py |_other ProFET files... |_test_seq/... |_train/ | |_A/ | | |_train_sequences_A.fasta | |_B/ | |_train_sequences_B.fasta |_test |_A/ | |_test_sequences_A.fasta |_B/ |_test_sequences_B.fasta
The fasta files were created with the following set of commands:
cd ./test_seq/Extracellular/ tail -n 1000 location-secreted_keyword-AKW-0964_reviewed_taxon-Tetrapoda_fragment-no_id-0.9.fasta > ../../train/A/train_sequences_A.fasta tail -n 1000 NOT-secreted_NOT-extracellular_reviewed_taxon-Tetrapoda_fragment-no_id-0.5.fasta > ../../train/B/train_sequences_B.fasta head -n 1000 location-secreted_keyword-AKW-0964_reviewed_taxon-Tetrapoda_fragment-no_id-0.9.fasta > ../../test/A/test_sequences_A.fasta head -n 1000 NOT-secreted_NOT-extracellular_reviewed_taxon-Tetrapoda_fragment-no_id-0.5.fasta > ../../test/B/test_sequences_B.fasta cd ../../When running the command:
python pipeline.py --trainingSetDir ./train --testingSetDir ./test --trainFeatures True --testFeatures True --classType dir
I get the following error message:
<cProfile.Profile object at 0x107745db0> Starting to extract features from training set dirr change to: ./train Multiclass fasta_files list found: [] Features generated Removing any all zero features df.shape: (0, 0) df_cleaned shape: (0, 0) Done Extracted training data features Training predictive model Traceback (most recent call last): File "pipeline.py", line 171, in
res = profiler.runcall(pipeline) File "/Users/charles/anaconda/envs/py34/lib/python3.4/cProfile.py", line 109, in runcall return func(_args, *_kw) File "pipeline.py", line 90, in pipeline model, lb_encoder = trainClassifier(filename=trainingDir+'/trainingSetFeatures.csv',normFlag= False,classifierType= classifierType,kbest= 0,alpha= False,optimalFlag= False) #Win File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 114, in trainClassifier features, labels, lb_encoder,featureNames = load_data(filename, 'file') File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 36, in load_data df = pd.read_csv(dataFrame, index_col=[0,1]) # is index column 0 in multiindex as well? File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 474, in parser_f return _read(filepath_or_buffer, kwds) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 250, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 566, in init self._make_engine(self.engine) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 705, in _make_engine self._engine = CParserWrapper(self.f, **self.options) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 1072, in init self._reader = _parser.TextReader(src, **kwds) File "pandas/parser.pyx", line 350, in pandas.parser.TextReader.cinit (pandas/parser.c:3173) File "pandas/parser.pyx", line 594, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:5912) OSError: File b'./train/trainingSetFeatures.csv' does not exist It complains that ./train/trainingSetFeatures.csv' does not exist. I see that a file with this name is being created in the train folder, however it is a table with only column names (no rows).
Thank you for your help.
— Reply to this email directly or view it on GitHub https://github.com/ddofer/ProFET/issues/2.
I have tried the following:
python pipeline.py --trainingSetDir ./train --testingSetDir ./test --trainFeatures True --testFeatures True --classType file
with the same result ./train/trainingSetFeatures.csv' does not exist.
Thanks for the update, I am looking forward to it.
What OS are you using? Try using the absolute file path. The update has been implemented.
Also - the Tail command outputs lines ; It could have messed up the fasta formated files. https://en.wikipedia.org/wiki/Tail_(Unix)
On Wed, Sep 9, 2015 at 4:08 PM, cnjr2 [email protected] wrote:
Thank you for the development of ProFET!
I wanted to try it out but I ran into some trouble. It would be great if you could point me towards where I am going wrong.
I am using python 3.4 and I have have installed all the dependencies mentioned in the README.md. I have the following folder structure where feat_extract is my working directory:
feat_extract/ |_pipeline.py |_other ProFET files... |_test_seq/... |_train/ | |_A/ | | |_train_sequences_A.fasta | |_B/ | |_train_sequences_B.fasta |_test |_A/ | |_test_sequences_A.fasta |_B/ |_test_sequences_B.fasta
The fasta files were created with the following set of commands:
cd ./test_seq/Extracellular/ tail -n 1000 location-secreted_keyword-AKW-0964_reviewed_taxon-Tetrapoda_fragment-no_id-0.9.fasta > ../../train/A/train_sequences_A.fasta tail -n 1000 NOT-secreted_NOT-extracellular_reviewed_taxon-Tetrapoda_fragment-no_id-0.5.fasta > ../../train/B/train_sequences_B.fasta head -n 1000 location-secreted_keyword-AKW-0964_reviewed_taxon-Tetrapoda_fragment-no_id-0.9.fasta > ../../test/A/test_sequences_A.fasta head -n 1000 NOT-secreted_NOT-extracellular_reviewed_taxon-Tetrapoda_fragment-no_id-0.5.fasta > ../../test/B/test_sequences_B.fasta cd ../../When running the command:
python pipeline.py --trainingSetDir ./train --testingSetDir ./test --trainFeatures True --testFeatures True --classType dir
I get the following error message:
<cProfile.Profile object at 0x107745db0> Starting to extract features from training set dirr change to: ./train Multiclass fasta_files list found: [] Features generated Removing any all zero features df.shape: (0, 0) df_cleaned shape: (0, 0) Done Extracted training data features Training predictive model Traceback (most recent call last): File "pipeline.py", line 171, in
res = profiler.runcall(pipeline) File "/Users/charles/anaconda/envs/py34/lib/python3.4/cProfile.py", line 109, in runcall return func(_args, *_kw) File "pipeline.py", line 90, in pipeline model, lb_encoder = trainClassifier(filename=trainingDir+'/trainingSetFeatures.csv',normFlag= False,classifierType= classifierType,kbest= 0,alpha= False,optimalFlag= False) #Win File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 114, in trainClassifier features, labels, lb_encoder,featureNames = load_data(filename, 'file') File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 36, in load_data df = pd.read_csv(dataFrame, index_col=[0,1]) # is index column 0 in multiindex as well? File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 474, in parser_f return _read(filepath_or_buffer, kwds) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 250, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 566, in init self._make_engine(self.engine) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 705, in _make_engine self._engine = CParserWrapper(self.f, **self.options) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 1072, in init self._reader = _parser.TextReader(src, **kwds) File "pandas/parser.pyx", line 350, in pandas.parser.TextReader.cinit (pandas/parser.c:3173) File "pandas/parser.pyx", line 594, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:5912) OSError: File b'./train/trainingSetFeatures.csv' does not exist It complains that ./train/trainingSetFeatures.csv' does not exist. I see that a file with this name is being created in the train folder, however it is a table with only column names (no rows).
Thank you for your help.
— Reply to this email directly or view it on GitHub https://github.com/ddofer/ProFET/issues/2.
Dan Ofer - דן עופר Publications http://scholar.google.co.il/citations?hl=en&user=uDx2ItYAAAAJ
Photography http://picasaweb.google.com/ddofer http://500px.com/DanOfer
Thanks Dan for your reply and the recent update.
I have now fixed the .fasta files and I have rerun ProFET with the same instructions as before:
python pipeline.py --trainingSetDir ./train --testingSetDir ./test --trainFeatures True --testFeatures True --classType dir
which gives this now:
<cProfile.Profile object at 0x1070c3590>
Starting to extract features from training set
Multiclass fasta_files list found: ['./train/B/train_sequences_B.fasta', './train/A/train_sequences_A.fasta']
Getting features from a single fasta file- dict_keys(['./train/B/train_sequences_B.fasta'])
Getting features from a single fasta file- dict_keys(['./train/A/train_sequences_A.fasta'])
Features generated
Removing any all zero features
df.shape: (254, 1170)
df_cleaned shape: (254, 1170)
Done
Extracted training data features
Training predictive model
Features files does not contains labels
Traceback (most recent call last):
File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 49, in load_data
df.set_index(keys = ['accession', 'classname'], inplace=True)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 2607, in set_index
level = frame[col].values
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 1797, in __getitem__
return self._getitem_column(key)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 1804, in _getitem_column
return self._get_item_cache(key)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/generic.py", line 1084, in _get_item_cache
values = self._data.get(item)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/internals.py", line 2851, in get
loc = self.items.get_loc(item)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/index.py", line 1572, in get_loc
return self._engine.get_loc(_values_from_object(key))
File "pandas/index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas/index.c:3824)
File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:3704)
File "pandas/hashtable.pyx", line 686, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12280)
File "pandas/hashtable.pyx", line 694, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12231)
KeyError: 'accession'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "pipeline.py", line 171, in <module>
res = profiler.runcall(pipeline)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/cProfile.py", line 109, in runcall
return func(*args, **kw)
File "pipeline.py", line 90, in pipeline
model, lb_encoder = trainClassifier(filename=trainingDir+'/trainingSetFeatures.csv',normFlag= False,classifierType= classifierType,kbest= 0,alpha= False,optimalFlag= False) #Win
File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 133, in trainClassifier
features, labels, label_encoder, featureNames = load_data(filename)
File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 56, in load_data
df.set_index(keys = 'accession', inplace=True)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 2607, in set_index
level = frame[col].values
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 1797, in __getitem__
return self._getitem_column(key)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 1804, in _getitem_column
return self._get_item_cache(key)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/generic.py", line 1084, in _get_item_cache
values = self._data.get(item)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/internals.py", line 2851, in get
loc = self.items.get_loc(item)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/index.py", line 1572, in get_loc
return self._engine.get_loc(_values_from_object(key))
File "pandas/index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas/index.c:3824)
File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:3704)
File "pandas/hashtable.pyx", line 686, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12280)
File "pandas/hashtable.pyx", line 694, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12231)
KeyError: 'accession'
Hence it does now seem to produce the features (i.e. the files trainingSetFeatures.csv and trainingSetNormParams.csv are now generated with some contents). This time the file is generated in the working directory... And now it complains that Features files does not contains labels. Where am I going wrong?
p.s.: I am on a Mac. p.p.s: I have also run the command with full paths.
There might be an issue with the "dir" option. (We didn't use it while writing the articles). The program is complaining that it's not getting labels/class for the sequences. Possibly the recent update messed up how the "-dir" option gives labels according to directories, but I'm only guessing.
Try running a "test case", with one of the other "labeling" options, I.e "-file" . Tell me if it still isn't working then - that will narrow it down. On Sep 20, 2015 1:57 PM, "cnjr2" [email protected] wrote:
Thanks Dan for your reply and the recent update.
I have now fixed the .fasta files and I have rerun ProFET with the same instructions as before:
python pipeline.py --trainingSetDir ./train --testingSetDir ./test --trainFeatures True --testFeatures True --classType dir
which gives this now:
<cProfile.Profile object at 0x1070c3590> Starting to extract features from training set Multiclass fasta_files list found: ['./train/B/train_sequences_B.fasta', './train/A/train_sequences_A.fasta'] Getting features from a single fasta file- dict_keys(['./train/B/train_sequences_B.fasta']) Getting features from a single fasta file- dict_keys(['./train/A/train_sequences_A.fasta']) Features generated Removing any all zero features df.shape: (254, 1170) df_cleaned shape: (254, 1170) Done Extracted training data features Training predictive model Features files does not contains labels Traceback (most recent call last): File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 49, in load_data df.set_index(keys = ['accession', 'classname'], inplace=True) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 2607, in set_index level = frame[col].values File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 1797, in getitem return self._getitem_column(key) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 1804, in _getitem_column return self._get_item_cache(key) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/generic.py", line 1084, in _get_item_cache values = self._data.get(item) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/internals.py", line 2851, in get loc = self.items.get_loc(item) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/index.py", line 1572, in get_loc return self._engine.get_loc(_values_from_object(key)) File "pandas/index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas/index.c:3824) File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:3704) File "pandas/hashtable.pyx", line 686, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12280) File "pandas/hashtable.pyx", line 694, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12231) KeyError: 'accession'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "pipeline.py", line 171, in
res = profiler.runcall(pipeline) File "/Users/charles/anaconda/envs/py34/lib/python3.4/cProfile.py", line 109, in runcall return func(_args, *_kw) File "pipeline.py", line 90, in pipeline model, lb_encoder = trainClassifier(filename=trainingDir+'/trainingSetFeatures.csv',normFlag= False,classifierType= classifierType,kbest= 0,alpha= False,optimalFlag= False) #Win File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 133, in trainClassifier features, labels, label_encoder, featureNames = load_data(filename) File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 56, in load_data df.set_index(keys = 'accession', inplace=True) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 2607, in set_index level = frame[col].values File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 1797, in getitem return self._getitem_column(key) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 1804, in _getitem_column return self._get_item_cache(key) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/generic.py", line 1084, in _get_item_cache values = self._data.get(item) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/internals.py", line 2851, in get loc = self.items.get_loc(item) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/index.py", line 1572, in get_loc return self._engine.get_loc(_values_from_object(key)) File "pandas/index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas/index.c:3824) File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:3704) File "pandas/hashtable.pyx", line 686, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12280) File "pandas/hashtable.pyx", line 694, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12231) KeyError: 'accession' Hence it does now seem to produce the features (i.e. the files trainingSetFeatures.csv and trainingSetNormParams.csv are now generated with some contents).
However it does now complain that Features files does not contains labels. Where am I going wrong?
p.s.: I am on a Mac.
— Reply to this email directly or view it on GitHub https://github.com/ddofer/ProFET/issues/2#issuecomment-141775271.
I have now tried changing to the --classType file whilst keeping my folder structure the same:
python pipeline.py --trainingSetDir ./train --testingSetDir ./test --trainFeatures True --testFeatures True --classType file
that gives:
<cProfile.Profile object at 0x106f4ed48>
Starting to extract features from training set
Traceback (most recent call last):
File "pipeline.py", line 171, in <module>
res = profiler.runcall(pipeline)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/cProfile.py", line 109, in runcall
return func(*args, **kw)
File "pipeline.py", line 83, in pipeline
classType=classType, normParams='.')
File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 602, in featExt
multiClass=True, Dirr = directory)
File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 444, in get_features
features = get_MultiClass_features(trainingSetFlag, classType)
File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 397, in get_MultiClass_features
fasta_files_dict = Get_Dirr_All_Fasta (classType,Dirr)
File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 308, in Get_Dirr_All_Fasta
files_dict[os.path.join(root, name)] = className
UnboundLocalError: local variable 'className' referenced before assignment
With --classType file - your "classes" should be in 2 seperate multifasta files (each containing all the sequences belonging to a class. [without "overlapping"/duplicates].
e.g. (In case you have 2 classes) in "Dir: Train" Train/Secreted.fasta Train/NegSecreted.fasta
And use this dir as the trainingSetDir
On Sun, Sep 20, 2015 at 2:17 PM, cnjr2 [email protected] wrote:
I have now tried changing to the --classType file whilst keeping my folder structure the same:
python pipeline.py --trainingSetDir ./train --testingSetDir ./test --trainFeatures True --testFeatures True --classType file
that gives:
<cProfile.Profile object at 0x106f4ed48> Starting to extract features from training set Traceback (most recent call last): File "pipeline.py", line 171, in
res = profiler.runcall(pipeline) File "/Users/charles/anaconda/envs/py34/lib/python3.4/cProfile.py", line 109, in runcall return func(_args, *_kw) File "pipeline.py", line 83, in pipeline classType=classType, normParams='.') File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 602, in featExt multiClass=True, Dirr = directory) File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 444, in get_features features = get_MultiClass_features(trainingSetFlag, classType) File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 397, in get_MultiClass_features fasta_files_dict = Get_Dirr_All_Fasta (classType,Dirr) File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 308, in Get_Dirr_All_Fasta files_dict[os.path.join(root, name)] = className UnboundLocalError: local variable 'className' referenced before assignment — Reply to this email directly or view it on GitHub https://github.com/ddofer/ProFET/issues/2#issuecomment-141778183.
Dan Ofer - דן עופר Publications http://scholar.google.co.il/citations?hl=en&user=uDx2ItYAAAAJ
Photography http://picasaweb.google.com/ddofer http://500px.com/DanOfer
I now tried:
python pipeline.py --trainingSetDir ./train --testingSetDir ./test --trainFeatures True --testFeatures True --classType file
with feat_extract being the working directory and the following folder structure:
I still get:
<cProfile.Profile object at 0x108148d48>
Starting to extract features from training set
Traceback (most recent call last):
File "pipeline.py", line 171, in <module>
res = profiler.runcall(pipeline)
File "/Users/charles/anaconda/envs/py34/lib/python3.4/cProfile.py", line 109, in runcall
return func(*args, **kw)
File "pipeline.py", line 83, in pipeline
classType=classType, normParams='.')
File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 602, in featExt
multiClass=True, Dirr = directory)
File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 444, in get_features
features = get_MultiClass_features(trainingSetFlag, classType)
File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 397, in get_MultiClass_features
fasta_files_dict = Get_Dirr_All_Fasta (classType,Dirr)
File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 308, in Get_Dirr_All_Fasta
files_dict[os.path.join(root, name)] = className
UnboundLocalError: local variable 'className' referenced before assignment
Dear Dan, are there any updates? Thanks for your help!
Hi, I'm afraid that I'll be unable to debug the the issue, as I'll be unavailable for the next month. I suggest forking from the earliest commit in the meantime. I really apologize! Good luck. On Oct 1, 2015 5:47 PM, "cnjr2" [email protected] wrote:
Dear Dan, are there any updates? Thanks for your help!
— Reply to this email directly or view it on GitHub https://github.com/ddofer/ProFET/issues/2#issuecomment-144750141.
Thanks for the info. I will give it a shot!
Worst case, just use the features generation methods/featureGen.py
Good luck On Oct 1, 2015 7:27 PM, "cnjr2" [email protected] wrote:
Thanks for the info. I will give it a shot!
— Reply to this email directly or view it on GitHub https://github.com/ddofer/ProFET/issues/2#issuecomment-144779391.
Hi, I am also getting exactly same error message i.e "Features files does not contains labels" and associated errors as indicated above. Does anyone get solution? Thanks