ProFET icon indicating copy to clipboard operation
ProFET copied to clipboard

trouble getting started

Open cnjr2 opened this issue 10 years ago • 14 comments

Thank you for the development of ProFET!

I wanted to try it out but I ran into some trouble. It would be great if you could point me towards where I am going wrong.

I am using python 3.4 and I have have installed all the dependencies mentioned in the README.md. I have the following folder structure where feat_extract is my working directory:

feat_extract/
|_pipeline.py
|_other ProFET files...
|_test_seq/...
|_train/
| |_A/
| | |_train_sequences_A.fasta
| |_B/
|   |_train_sequences_B.fasta
|_test
  |_A/
  | |_test_sequences_A.fasta
  |_B/
    |_test_sequences_B.fasta

The fasta files were created with the following set of commands:

    cd ./test_seq/Extracellular/
    tail -n 1000 location-secreted_keyword-AKW-0964_reviewed_taxon-Tetrapoda_fragment-no_id-0.9.fasta > ../../train/A/train_sequences_A.fasta
    tail -n 1000 NOT-secreted_NOT-extracellular_reviewed_taxon-Tetrapoda_fragment-no_id-0.5.fasta > ../../train/B/train_sequences_B.fasta
    head -n 1000 location-secreted_keyword-AKW-0964_reviewed_taxon-Tetrapoda_fragment-no_id-0.9.fasta > ../../test/A/test_sequences_A.fasta
    head -n 1000 NOT-secreted_NOT-extracellular_reviewed_taxon-Tetrapoda_fragment-no_id-0.5.fasta > ../../test/B/test_sequences_B.fasta
    cd ../../

When running the command:

python pipeline.py --trainingSetDir ./train --testingSetDir ./test --trainFeatures True --testFeatures True --classType dir

I get the following error message:

<cProfile.Profile object at 0x107745db0>
Starting to extract features from training set
dirr change to: ./train
Multiclass fasta_files list found: []
Features generated
Removing any all zero features
df.shape:  (0, 0)
df_cleaned shape:  (0, 0)
Done
Extracted training data features
Training predictive model
Traceback (most recent call last):
  File "pipeline.py", line 171, in <module>
    res = profiler.runcall(pipeline)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/cProfile.py", line 109, in runcall
    return func(*args, **kw)
  File "pipeline.py", line 90, in pipeline
    model, lb_encoder = trainClassifier(filename=trainingDir+'/trainingSetFeatures.csv',normFlag= False,classifierType= classifierType,kbest= 0,alpha= False,optimalFlag= False) #Win
  File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 114, in trainClassifier
    features, labels, lb_encoder,featureNames = load_data(filename, 'file')
  File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 36, in load_data
    df = pd.read_csv(dataFrame, index_col=[0,1]) # is index column 0 in multiindex as well?
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 474, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 250, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 566, in __init__
    self._make_engine(self.engine)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 705, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 1072, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "pandas/parser.pyx", line 350, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:3173)
  File "pandas/parser.pyx", line 594, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:5912)
OSError: File b'./train/trainingSetFeatures.csv' does not exist

It complains that ./train/trainingSetFeatures.csv' does not exist. I see that a file with this name is being created in the train folder, however it is a table with only column names (no rows).

Thank you for your help.

cnjr2 avatar Sep 09 '15 13:09 cnjr2

The problem is that no features are extracted. (Not sure why). Have you tried extracting features using the "file" vs "dir" option? I'll be uploading an update In the next few days. On Sep 9, 2015 4:08 PM, "cnjr2" [email protected] wrote:

Thank you for the development of ProFET!

I wanted to try it out but I ran into some trouble. It would be great if you could point me towards where I am going wrong.

I am using python 3.4 and I have have installed all the dependencies mentioned in the README.md. I have the following folder structure where feat_extract is my working directory:

feat_extract/ |_pipeline.py |_other ProFET files... |_test_seq/... |_train/ | |_A/ | | |_train_sequences_A.fasta | |_B/ | |_train_sequences_B.fasta |_test |_A/ | |_test_sequences_A.fasta |_B/ |_test_sequences_B.fasta

The fasta files were created with the following set of commands:

cd ./test_seq/Extracellular/
tail -n 1000 location-secreted_keyword-AKW-0964_reviewed_taxon-Tetrapoda_fragment-no_id-0.9.fasta > ../../train/A/train_sequences_A.fasta
tail -n 1000 NOT-secreted_NOT-extracellular_reviewed_taxon-Tetrapoda_fragment-no_id-0.5.fasta > ../../train/B/train_sequences_B.fasta
head -n 1000 location-secreted_keyword-AKW-0964_reviewed_taxon-Tetrapoda_fragment-no_id-0.9.fasta > ../../test/A/test_sequences_A.fasta
head -n 1000 NOT-secreted_NOT-extracellular_reviewed_taxon-Tetrapoda_fragment-no_id-0.5.fasta > ../../test/B/test_sequences_B.fasta
cd ../../

When running the command:

python pipeline.py --trainingSetDir ./train --testingSetDir ./test --trainFeatures True --testFeatures True --classType dir

I get the following error message:

<cProfile.Profile object at 0x107745db0> Starting to extract features from training set dirr change to: ./train Multiclass fasta_files list found: [] Features generated Removing any all zero features df.shape: (0, 0) df_cleaned shape: (0, 0) Done Extracted training data features Training predictive model Traceback (most recent call last): File "pipeline.py", line 171, in res = profiler.runcall(pipeline) File "/Users/charles/anaconda/envs/py34/lib/python3.4/cProfile.py", line 109, in runcall return func(_args, *_kw) File "pipeline.py", line 90, in pipeline model, lb_encoder = trainClassifier(filename=trainingDir+'/trainingSetFeatures.csv',normFlag= False,classifierType= classifierType,kbest= 0,alpha= False,optimalFlag= False) #Win File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 114, in trainClassifier features, labels, lb_encoder,featureNames = load_data(filename, 'file') File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 36, in load_data df = pd.read_csv(dataFrame, index_col=[0,1]) # is index column 0 in multiindex as well? File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 474, in parser_f return _read(filepath_or_buffer, kwds) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 250, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 566, in init self._make_engine(self.engine) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 705, in _make_engine self._engine = CParserWrapper(self.f, **self.options) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 1072, in init self._reader = _parser.TextReader(src, **kwds) File "pandas/parser.pyx", line 350, in pandas.parser.TextReader.cinit (pandas/parser.c:3173) File "pandas/parser.pyx", line 594, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:5912) OSError: File b'./train/trainingSetFeatures.csv' does not exist

It complains that ./train/trainingSetFeatures.csv' does not exist. I see that a file with this name is being created in the train folder, however it is a table with only column names (no rows).

Thank you for your help.

— Reply to this email directly or view it on GitHub https://github.com/ddofer/ProFET/issues/2.

ddofer avatar Sep 09 '15 17:09 ddofer

I have tried the following:

python pipeline.py --trainingSetDir ./train --testingSetDir ./test --trainFeatures True --testFeatures True --classType file

with the same result ./train/trainingSetFeatures.csv' does not exist.

Thanks for the update, I am looking forward to it.

cnjr2 avatar Sep 09 '15 17:09 cnjr2

What OS are you using? Try using the absolute file path. The update has been implemented.

ddofer avatar Sep 14 '15 13:09 ddofer

Also - the Tail command outputs lines ; It could have messed up the fasta formated files. https://en.wikipedia.org/wiki/Tail_(Unix)

On Wed, Sep 9, 2015 at 4:08 PM, cnjr2 [email protected] wrote:

Thank you for the development of ProFET!

I wanted to try it out but I ran into some trouble. It would be great if you could point me towards where I am going wrong.

I am using python 3.4 and I have have installed all the dependencies mentioned in the README.md. I have the following folder structure where feat_extract is my working directory:

feat_extract/ |_pipeline.py |_other ProFET files... |_test_seq/... |_train/ | |_A/ | | |_train_sequences_A.fasta | |_B/ | |_train_sequences_B.fasta |_test |_A/ | |_test_sequences_A.fasta |_B/ |_test_sequences_B.fasta

The fasta files were created with the following set of commands:

cd ./test_seq/Extracellular/
tail -n 1000 location-secreted_keyword-AKW-0964_reviewed_taxon-Tetrapoda_fragment-no_id-0.9.fasta > ../../train/A/train_sequences_A.fasta
tail -n 1000 NOT-secreted_NOT-extracellular_reviewed_taxon-Tetrapoda_fragment-no_id-0.5.fasta > ../../train/B/train_sequences_B.fasta
head -n 1000 location-secreted_keyword-AKW-0964_reviewed_taxon-Tetrapoda_fragment-no_id-0.9.fasta > ../../test/A/test_sequences_A.fasta
head -n 1000 NOT-secreted_NOT-extracellular_reviewed_taxon-Tetrapoda_fragment-no_id-0.5.fasta > ../../test/B/test_sequences_B.fasta
cd ../../

When running the command:

python pipeline.py --trainingSetDir ./train --testingSetDir ./test --trainFeatures True --testFeatures True --classType dir

I get the following error message:

<cProfile.Profile object at 0x107745db0> Starting to extract features from training set dirr change to: ./train Multiclass fasta_files list found: [] Features generated Removing any all zero features df.shape: (0, 0) df_cleaned shape: (0, 0) Done Extracted training data features Training predictive model Traceback (most recent call last): File "pipeline.py", line 171, in res = profiler.runcall(pipeline) File "/Users/charles/anaconda/envs/py34/lib/python3.4/cProfile.py", line 109, in runcall return func(_args, *_kw) File "pipeline.py", line 90, in pipeline model, lb_encoder = trainClassifier(filename=trainingDir+'/trainingSetFeatures.csv',normFlag= False,classifierType= classifierType,kbest= 0,alpha= False,optimalFlag= False) #Win File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 114, in trainClassifier features, labels, lb_encoder,featureNames = load_data(filename, 'file') File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 36, in load_data df = pd.read_csv(dataFrame, index_col=[0,1]) # is index column 0 in multiindex as well? File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 474, in parser_f return _read(filepath_or_buffer, kwds) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 250, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 566, in init self._make_engine(self.engine) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 705, in _make_engine self._engine = CParserWrapper(self.f, **self.options) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py", line 1072, in init self._reader = _parser.TextReader(src, **kwds) File "pandas/parser.pyx", line 350, in pandas.parser.TextReader.cinit (pandas/parser.c:3173) File "pandas/parser.pyx", line 594, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:5912) OSError: File b'./train/trainingSetFeatures.csv' does not exist

It complains that ./train/trainingSetFeatures.csv' does not exist. I see that a file with this name is being created in the train folder, however it is a table with only column names (no rows).

Thank you for your help.

— Reply to this email directly or view it on GitHub https://github.com/ddofer/ProFET/issues/2.

Dan Ofer - דן עופר Publications http://scholar.google.co.il/citations?hl=en&user=uDx2ItYAAAAJ

Photography http://picasaweb.google.com/ddofer http://500px.com/DanOfer

ddofer avatar Sep 15 '15 08:09 ddofer

Thanks Dan for your reply and the recent update.

I have now fixed the .fasta files and I have rerun ProFET with the same instructions as before:

python pipeline.py --trainingSetDir ./train --testingSetDir ./test --trainFeatures True --testFeatures True --classType dir

which gives this now:

<cProfile.Profile object at 0x1070c3590>
Starting to extract features from training set
Multiclass fasta_files list found: ['./train/B/train_sequences_B.fasta', './train/A/train_sequences_A.fasta']
Getting features from a single fasta file- dict_keys(['./train/B/train_sequences_B.fasta'])
Getting features from a single fasta file- dict_keys(['./train/A/train_sequences_A.fasta'])
Features generated
Removing any all zero features
df.shape:  (254, 1170)
df_cleaned shape:  (254, 1170)
Done
Extracted training data features
Training predictive model
Features files does not contains labels
Traceback (most recent call last):
  File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 49, in load_data
    df.set_index(keys = ['accession', 'classname'], inplace=True)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 2607, in set_index
    level = frame[col].values
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 1797, in __getitem__
    return self._getitem_column(key)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 1804, in _getitem_column
    return self._get_item_cache(key)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/generic.py", line 1084, in _get_item_cache
    values = self._data.get(item)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/internals.py", line 2851, in get
    loc = self.items.get_loc(item)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/index.py", line 1572, in get_loc
    return self._engine.get_loc(_values_from_object(key))
  File "pandas/index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas/index.c:3824)
  File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:3704)
  File "pandas/hashtable.pyx", line 686, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12280)
  File "pandas/hashtable.pyx", line 694, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12231)
KeyError: 'accession'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "pipeline.py", line 171, in <module>
    res = profiler.runcall(pipeline)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/cProfile.py", line 109, in runcall
    return func(*args, **kw)
  File "pipeline.py", line 90, in pipeline
    model, lb_encoder = trainClassifier(filename=trainingDir+'/trainingSetFeatures.csv',normFlag= False,classifierType= classifierType,kbest= 0,alpha= False,optimalFlag= False) #Win
  File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 133, in trainClassifier
    features, labels, label_encoder, featureNames = load_data(filename)
  File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 56, in load_data
    df.set_index(keys = 'accession', inplace=True)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 2607, in set_index
    level = frame[col].values
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 1797, in __getitem__
    return self._getitem_column(key)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 1804, in _getitem_column
    return self._get_item_cache(key)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/generic.py", line 1084, in _get_item_cache
    values = self._data.get(item)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/internals.py", line 2851, in get
    loc = self.items.get_loc(item)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/index.py", line 1572, in get_loc
    return self._engine.get_loc(_values_from_object(key))
  File "pandas/index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas/index.c:3824)
  File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:3704)
  File "pandas/hashtable.pyx", line 686, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12280)
  File "pandas/hashtable.pyx", line 694, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12231)
KeyError: 'accession'

Hence it does now seem to produce the features (i.e. the files trainingSetFeatures.csv and trainingSetNormParams.csv are now generated with some contents). This time the file is generated in the working directory... And now it complains that Features files does not contains labels. Where am I going wrong?

p.s.: I am on a Mac. p.p.s: I have also run the command with full paths.

cnjr2 avatar Sep 20 '15 10:09 cnjr2

There might be an issue with the "dir" option. (We didn't use it while writing the articles). The program is complaining that it's not getting labels/class for the sequences. Possibly the recent update messed up how the "-dir" option gives labels according to directories, but I'm only guessing.

Try running a "test case", with one of the other "labeling" options, I.e "-file" . Tell me if it still isn't working then - that will narrow it down. On Sep 20, 2015 1:57 PM, "cnjr2" [email protected] wrote:

Thanks Dan for your reply and the recent update.

I have now fixed the .fasta files and I have rerun ProFET with the same instructions as before:

python pipeline.py --trainingSetDir ./train --testingSetDir ./test --trainFeatures True --testFeatures True --classType dir

which gives this now:

<cProfile.Profile object at 0x1070c3590> Starting to extract features from training set Multiclass fasta_files list found: ['./train/B/train_sequences_B.fasta', './train/A/train_sequences_A.fasta'] Getting features from a single fasta file- dict_keys(['./train/B/train_sequences_B.fasta']) Getting features from a single fasta file- dict_keys(['./train/A/train_sequences_A.fasta']) Features generated Removing any all zero features df.shape: (254, 1170) df_cleaned shape: (254, 1170) Done Extracted training data features Training predictive model Features files does not contains labels Traceback (most recent call last): File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 49, in load_data df.set_index(keys = ['accession', 'classname'], inplace=True) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 2607, in set_index level = frame[col].values File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 1797, in getitem return self._getitem_column(key) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 1804, in _getitem_column return self._get_item_cache(key) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/generic.py", line 1084, in _get_item_cache values = self._data.get(item) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/internals.py", line 2851, in get loc = self.items.get_loc(item) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/index.py", line 1572, in get_loc return self._engine.get_loc(_values_from_object(key)) File "pandas/index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas/index.c:3824) File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:3704) File "pandas/hashtable.pyx", line 686, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12280) File "pandas/hashtable.pyx", line 694, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12231) KeyError: 'accession'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "pipeline.py", line 171, in res = profiler.runcall(pipeline) File "/Users/charles/anaconda/envs/py34/lib/python3.4/cProfile.py", line 109, in runcall return func(_args, *_kw) File "pipeline.py", line 90, in pipeline model, lb_encoder = trainClassifier(filename=trainingDir+'/trainingSetFeatures.csv',normFlag= False,classifierType= classifierType,kbest= 0,alpha= False,optimalFlag= False) #Win File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 133, in trainClassifier features, labels, label_encoder, featureNames = load_data(filename) File "/Users/charles/Downloads/feat_extract/Model_trainer.py", line 56, in load_data df.set_index(keys = 'accession', inplace=True) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 2607, in set_index level = frame[col].values File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 1797, in getitem return self._getitem_column(key) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/frame.py", line 1804, in _getitem_column return self._get_item_cache(key) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/generic.py", line 1084, in _get_item_cache values = self._data.get(item) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/internals.py", line 2851, in get loc = self.items.get_loc(item) File "/Users/charles/anaconda/envs/py34/lib/python3.4/site-packages/pandas/core/index.py", line 1572, in get_loc return self._engine.get_loc(_values_from_object(key)) File "pandas/index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas/index.c:3824) File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:3704) File "pandas/hashtable.pyx", line 686, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12280) File "pandas/hashtable.pyx", line 694, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12231) KeyError: 'accession'

Hence it does now seem to produce the features (i.e. the files trainingSetFeatures.csv and trainingSetNormParams.csv are now generated with some contents).

However it does now complain that Features files does not contains labels. Where am I going wrong?

p.s.: I am on a Mac.

— Reply to this email directly or view it on GitHub https://github.com/ddofer/ProFET/issues/2#issuecomment-141775271.

ddofer avatar Sep 20 '15 11:09 ddofer

I have now tried changing to the --classType file whilst keeping my folder structure the same:

python pipeline.py --trainingSetDir ./train --testingSetDir ./test --trainFeatures True --testFeatures True --classType file

that gives:

<cProfile.Profile object at 0x106f4ed48>
Starting to extract features from training set
Traceback (most recent call last):
  File "pipeline.py", line 171, in <module>
    res = profiler.runcall(pipeline)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/cProfile.py", line 109, in runcall
    return func(*args, **kw)
  File "pipeline.py", line 83, in pipeline
    classType=classType, normParams='.')
  File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 602, in featExt
    multiClass=True, Dirr = directory)
  File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 444, in get_features
    features = get_MultiClass_features(trainingSetFlag, classType)
  File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 397, in get_MultiClass_features
    fasta_files_dict = Get_Dirr_All_Fasta (classType,Dirr)
  File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 308, in Get_Dirr_All_Fasta
    files_dict[os.path.join(root, name)] = className
UnboundLocalError: local variable 'className' referenced before assignment

cnjr2 avatar Sep 20 '15 11:09 cnjr2

With --classType file - your "classes" should be in 2 seperate multifasta files (each containing all the sequences belonging to a class. [without "overlapping"/duplicates].

e.g. (In case you have 2 classes) in "Dir: Train" Train/Secreted.fasta Train/NegSecreted.fasta

And use this dir as the trainingSetDir

On Sun, Sep 20, 2015 at 2:17 PM, cnjr2 [email protected] wrote:

I have now tried changing to the --classType file whilst keeping my folder structure the same:

python pipeline.py --trainingSetDir ./train --testingSetDir ./test --trainFeatures True --testFeatures True --classType file

that gives:

<cProfile.Profile object at 0x106f4ed48> Starting to extract features from training set Traceback (most recent call last): File "pipeline.py", line 171, in res = profiler.runcall(pipeline) File "/Users/charles/anaconda/envs/py34/lib/python3.4/cProfile.py", line 109, in runcall return func(_args, *_kw) File "pipeline.py", line 83, in pipeline classType=classType, normParams='.') File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 602, in featExt multiClass=True, Dirr = directory) File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 444, in get_features features = get_MultiClass_features(trainingSetFlag, classType) File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 397, in get_MultiClass_features fasta_files_dict = Get_Dirr_All_Fasta (classType,Dirr) File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 308, in Get_Dirr_All_Fasta files_dict[os.path.join(root, name)] = className UnboundLocalError: local variable 'className' referenced before assignment

— Reply to this email directly or view it on GitHub https://github.com/ddofer/ProFET/issues/2#issuecomment-141778183.

Dan Ofer - דן עופר Publications http://scholar.google.co.il/citations?hl=en&user=uDx2ItYAAAAJ

Photography http://picasaweb.google.com/ddofer http://500px.com/DanOfer

ddofer avatar Sep 20 '15 11:09 ddofer

I now tried:

python pipeline.py --trainingSetDir ./train --testingSetDir ./test --trainFeatures True --testFeatures True --classType file

with feat_extract being the working directory and the following folder structure:

screenshot_20_09_2015_15_41

I still get:

<cProfile.Profile object at 0x108148d48>
Starting to extract features from training set
Traceback (most recent call last):
  File "pipeline.py", line 171, in <module>
    res = profiler.runcall(pipeline)
  File "/Users/charles/anaconda/envs/py34/lib/python3.4/cProfile.py", line 109, in runcall
    return func(*args, **kw)
  File "pipeline.py", line 83, in pipeline
    classType=classType, normParams='.')
  File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 602, in featExt
    multiClass=True, Dirr = directory)
  File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 444, in get_features
    features = get_MultiClass_features(trainingSetFlag, classType)
  File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 397, in get_MultiClass_features
    fasta_files_dict = Get_Dirr_All_Fasta (classType,Dirr)
  File "/Users/charles/Downloads/feat_extract/FeatureGen.py", line 308, in Get_Dirr_All_Fasta
    files_dict[os.path.join(root, name)] = className
UnboundLocalError: local variable 'className' referenced before assignment

cnjr2 avatar Sep 20 '15 14:09 cnjr2

Dear Dan, are there any updates? Thanks for your help!

cnjr2 avatar Oct 01 '15 14:10 cnjr2

Hi, I'm afraid that I'll be unable to debug the the issue, as I'll be unavailable for the next month. I suggest forking from the earliest commit in the meantime. I really apologize! Good luck. On Oct 1, 2015 5:47 PM, "cnjr2" [email protected] wrote:

Dear Dan, are there any updates? Thanks for your help!

— Reply to this email directly or view it on GitHub https://github.com/ddofer/ProFET/issues/2#issuecomment-144750141.

ddofer avatar Oct 01 '15 16:10 ddofer

Thanks for the info. I will give it a shot!

cnjr2 avatar Oct 01 '15 16:10 cnjr2

Worst case, just use the features generation methods/featureGen.py

Good luck On Oct 1, 2015 7:27 PM, "cnjr2" [email protected] wrote:

Thanks for the info. I will give it a shot!

— Reply to this email directly or view it on GitHub https://github.com/ddofer/ProFET/issues/2#issuecomment-144779391.

ddofer avatar Oct 01 '15 16:10 ddofer

Hi, I am also getting exactly same error message i.e "Features files does not contains labels" and associated errors as indicated above. Does anyone get solution? Thanks

ChalaTuro avatar Nov 08 '16 15:11 ChalaTuro