termite-data-server icon indicating copy to clipboard operation
termite-data-server copied to clipboard

Issue running demos on Ubuntu 14

Open olivermueller opened this issue 10 years ago • 5 comments

Hi there,

I get the following error message on all demos:

Importing a file into MALLET: [data/demo/20newsgroups/corpus/corpus.txt] --> [data/demo/20newsgroups/model-mallet/corpus.mallet] Traceback (most recent call last): File "bin/train_mallet.py", line 42, in main() File "bin/train_mallet.py", line 39, in main TrainMallet( args.corpus_path, args.model_path, args.token_regex, args.topics, args.iters, args.quiet, args.overwrite ) File "bin/train_mallet.py", line 25, in TrainMallet BuildLDA( corpus_filename, model_path, tokenRegex = token_regex, numTopics = num_topics, numIters = num_iters ) File "/home/oliver/Desktop/termite-data-server-master/bin/modellers/MalletLDA.py", line 31, in init importer.ImportFileOrFolder( tokenRegex ) File "/home/oliver/Desktop/termite-data-server-master/bin/modellers/MalletLDA.py", line 76, in ImportFileOrFolder self.Shell( command ) File "/home/oliver/Desktop/termite-data-server-master/bin/modellers/MalletLDA.py", line 44, in Shell p = subprocess.Popen( command, stdout = subprocess.PIPE, stderr = subprocess.STDOUT ) File "/usr/lib/python2.7/subprocess.py", line 710, in init errread, errwrite) File "/usr/lib/python2.7/subprocess.py", line 1327, in _execute_child raise child_exception OSError: [Errno 2] No such file or directory

Best, Oliver

olivermueller avatar May 02 '14 09:05 olivermueller

The file corpus.txt should have been automatically extracted from corpus.db when you download the dataset.

Run the following command to generate the file: bin/export_corpus.py data/demo/20newsgroups/corpus data/demo/20newsgroups/corpus/corpus.txt

You might want to remove the following folder, so that you have a clean start when you train an LDA in MALLET. rm -rf data/demo/20newsgroups/model-mallet

Then, try running the following again. ./demo.py 20newsgroups mallet

jcchuang avatar May 03 '14 01:05 jcchuang

The file corpus.txt is being created. Here is a longer extract of the error message:

Copying [data/demo/infovis/corpus/corpus.db] --> [apps/temp_20140503_190729_997992_3269/data/corpus.db] Copying [data/demo/infovis/corpus/corpus.txt] --> [apps/temp_20140503_190729_997992_3269/data/corpus.txt] Extracting [data/demo/infovis/corpus/corpus.txt] --> [apps/temp_20140503_190729_997992_3269/data/sentences.txt] An error occured while creating app: infovis_mallet [apps/infovis_mallet] Traceback (most recent call last): File "bin/read_mallet.py", line 87, in main() File "bin/read_mallet.py", line 84, in main ImportMalletLDA( args.app_name, args.model_path, args.corpus_path, args.database_path, args.quiet, args.overwrite ) File "bin/read_mallet.py", line 50, in ImportMalletLDA SplitSentences( corpus_filename, app_sentences_filename ) File "/home/oliver/termite-data-server-master/bin/apps/SplitSentences.py", line 14, in init self.Shell( command ) File "/home/oliver/termite-data-server-master/bin/apps/SplitSentences.py", line 17, in Shell p = subprocess.Popen( command, stdout = subprocess.PIPE, stderr = subprocess.STDOUT ) File "/usr/lib/python2.7/subprocess.py", line 710, in init errread, errwrite) File "/usr/lib/python2.7/subprocess.py", line 1327, in _execute_child raise child_exception OSError: [Errno 2] No such file or directory

The error occurs for mallet and gensim.

olivermueller avatar May 03 '14 17:05 olivermueller

There must be other issues that cause these files to be missing. Regenerating these files doesn't fix the root problem. Could you remove the data/demo/infovis and apps/infovis_mallet folders, and run "./demo.py infovis mallet"? What is the full console output?

jcchuang avatar May 06 '14 21:05 jcchuang

Here is the full console output:

oliver@ubuntu:~$ cd termite-data-server-master/ oliver@ubuntu:~/termite-data-server-master$ sudo python demo.py 20newsgroups mallet [sudo] password for oliver:

Build a topic model (mallet) using a demo dataset (20newsgroups) database = data/demo/20newsgroups/corpus corpus = data/demo/20newsgroups/corpus model = data/demo/20newsgroups/model-mallet app = 20newsgroups_mallet

Setting up the 20newsgroups dataset... Creating folder 'data/demo/20newsgroups'... Downloading... % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 38.1M 100 38.1M 0 0 279k 0 0:02:19 0:02:19 --:--:-- 291k Uncompressing... Extracting corpus.txt from corpus.db... Exporting database [data/demo/20newsgroups/corpus/corpus.db] to file [data/demo/20newsgroups/corpus/corpus.txt] Corpus available: data/demo/20newsgroups/corpus Available: tools/mallet-2.0.7 Available: tools/mallet-2.0.7 Available: tools/corenlp-3.3.1

Training an LDA topic model using MALLET... corpus = data/demo/20newsgroups/corpus/corpus.txt model = data/demo/20newsgroups/model-mallet token_regex = \w{3,} topics = 20 iters = 1000

Importing a file into MALLET: [data/demo/20newsgroups/corpus/corpus.txt] --> [data/demo/20newsgroups/model-mallet/corpus.mallet] Traceback (most recent call last): File "bin/train_mallet.py", line 42, in main() File "bin/train_mallet.py", line 39, in main TrainMallet( args.corpus_path, args.model_path, args.token_regex, args.topics, args.iters, args.quiet, args.overwrite ) File "bin/train_mallet.py", line 25, in TrainMallet BuildLDA( corpus_filename, model_path, tokenRegex = token_regex, numTopics = num_topics, numIters = num_iters ) File "/home/oliver/termite-data-server-master/bin/modellers/MalletLDA.py", line 31, in init importer.ImportFileOrFolder( tokenRegex ) File "/home/oliver/termite-data-server-master/bin/modellers/MalletLDA.py", line 76, in ImportFileOrFolder self.Shell( command ) File "/home/oliver/termite-data-server-master/bin/modellers/MalletLDA.py", line 44, in Shell p = subprocess.Popen( command, stdout = subprocess.PIPE, stderr = subprocess.STDOUT ) File "/usr/lib/python2.7/subprocess.py", line 710, in init errread, errwrite) File "/usr/lib/python2.7/subprocess.py", line 1327, in _execute_child raise child_exception OSError: [Errno 2] No such file or directory

Import a MALLET LDA topic model as a web2py application... app_name = 20newsgroups_mallet app_path = apps/20newsgroups_mallet model_path = data/demo/20newsgroups/model-mallet corpus_filename = data/demo/20newsgroups/corpus/corpus.txt database_filename = data/demo/20newsgroups/corpus/corpus.db

Creating app: 20newsgroups_mallet [apps/temp_20140518_102959_056298_2262] Creating folder: [apps/temp_20140518_102959_056298_2262/data] Creating folder: [apps/temp_20140518_102959_056298_2262/databases] Linking folder: [apps/temp_20140518_102959_056298_2262/models] Linking folder: [apps/temp_20140518_102959_056298_2262/views] Linking folder: [apps/temp_20140518_102959_056298_2262/controllers] Linking folder: [apps/temp_20140518_102959_056298_2262/static] Linking folder: [apps/temp_20140518_102959_056298_2262/modules] Creating file: [apps/temp_20140518_102959_056298_2262/init.py] Copying [data/demo/20newsgroups/corpus/corpus.db] --> [apps/temp_20140518_102959_056298_2262/data/corpus.db] Copying [data/demo/20newsgroups/corpus/corpus.txt] --> [apps/temp_20140518_102959_056298_2262/data/corpus.txt] Extracting [data/demo/20newsgroups/corpus/corpus.txt] --> [apps/temp_20140518_102959_056298_2262/data/sentences.txt] An error occured while creating app: 20newsgroups_mallet [apps/20newsgroups_mallet] Traceback (most recent call last): File "bin/read_mallet.py", line 87, in main() File "bin/read_mallet.py", line 84, in main ImportMalletLDA( args.app_name, args.model_path, args.corpus_path, args.database_path, args.quiet, args.overwrite ) File "bin/read_mallet.py", line 50, in ImportMalletLDA SplitSentences( corpus_filename, app_sentences_filename ) File "/home/oliver/termite-data-server-master/bin/apps/SplitSentences.py", line 14, in init self.Shell( command ) File "/home/oliver/termite-data-server-master/bin/apps/SplitSentences.py", line 17, in Shell p = subprocess.Popen( command, stdout = subprocess.PIPE, stderr = subprocess.STDOUT ) File "/usr/lib/python2.7/subprocess.py", line 710, in init errread, errwrite) File "/usr/lib/python2.7/subprocess.py", line 1327, in _execute_child raise child_exception OSError: [Errno 2] No such file or directory oliver@ubuntu:~/termite-data-server-master$

olivermueller avatar May 18 '14 08:05 olivermueller

Ran into some of the same problems here.

  1. make sure you have curl installed in your distro (i.e. sudo apt-get install curl). If you don't then the line

curl --insecure --location http://homes.cs.washington.edu/~jcchuang/termite-datasets/$DEMO.zip > $DOWNLOAD_PATH/$DEMO.zip

located in fetch_dataset.sh cannot execute

  1. make sure the mallet, CoreNLP and gensim are well downloaded in the utils/tools. CoreNLP is quite large so it may take time.
  2. You can always check that the demo.zip can actually open and you can see the contents. Part of the problems I encountered were exactly that...the db wasn't being unzipped so the .txt file couldn't be created

jsbarry avatar Jun 05 '14 21:06 jsbarry