termite-data-server
termite-data-server copied to clipboard
Issue running demos on Ubuntu 14
Hi there,
I get the following error message on all demos:
Importing a file into MALLET: [data/demo/20newsgroups/corpus/corpus.txt] --> [data/demo/20newsgroups/model-mallet/corpus.mallet]
Traceback (most recent call last):
File "bin/train_mallet.py", line 42, in
Best, Oliver
The file corpus.txt should have been automatically extracted from corpus.db when you download the dataset.
Run the following command to generate the file:
bin/export_corpus.py data/demo/20newsgroups/corpus data/demo/20newsgroups/corpus/corpus.txt
You might want to remove the following folder, so that you have a clean start when you train an LDA in MALLET.
rm -rf data/demo/20newsgroups/model-mallet
Then, try running the following again.
./demo.py 20newsgroups mallet
The file corpus.txt is being created. Here is a longer extract of the error message:
Copying [data/demo/infovis/corpus/corpus.db] --> [apps/temp_20140503_190729_997992_3269/data/corpus.db]
Copying [data/demo/infovis/corpus/corpus.txt] --> [apps/temp_20140503_190729_997992_3269/data/corpus.txt]
Extracting [data/demo/infovis/corpus/corpus.txt] --> [apps/temp_20140503_190729_997992_3269/data/sentences.txt]
An error occured while creating app: infovis_mallet [apps/infovis_mallet]
Traceback (most recent call last):
File "bin/read_mallet.py", line 87, in
The error occurs for mallet and gensim.
There must be other issues that cause these files to be missing. Regenerating these files doesn't fix the root problem. Could you remove the data/demo/infovis and apps/infovis_mallet folders, and run "./demo.py infovis mallet"? What is the full console output?
Here is the full console output:
oliver@ubuntu:~$ cd termite-data-server-master/ oliver@ubuntu:~/termite-data-server-master$ sudo python demo.py 20newsgroups mallet [sudo] password for oliver:
Build a topic model (mallet) using a demo dataset (20newsgroups) database = data/demo/20newsgroups/corpus corpus = data/demo/20newsgroups/corpus model = data/demo/20newsgroups/model-mallet app = 20newsgroups_mallet
Setting up the 20newsgroups dataset... Creating folder 'data/demo/20newsgroups'... Downloading... % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 38.1M 100 38.1M 0 0 279k 0 0:02:19 0:02:19 --:--:-- 291k Uncompressing... Extracting corpus.txt from corpus.db... Exporting database [data/demo/20newsgroups/corpus/corpus.db] to file [data/demo/20newsgroups/corpus/corpus.txt] Corpus available: data/demo/20newsgroups/corpus Available: tools/mallet-2.0.7 Available: tools/mallet-2.0.7 Available: tools/corenlp-3.3.1
Training an LDA topic model using MALLET... corpus = data/demo/20newsgroups/corpus/corpus.txt model = data/demo/20newsgroups/model-mallet token_regex = \w{3,} topics = 20 iters = 1000
Importing a file into MALLET: [data/demo/20newsgroups/corpus/corpus.txt] --> [data/demo/20newsgroups/model-mallet/corpus.mallet]
Traceback (most recent call last):
File "bin/train_mallet.py", line 42, in
Import a MALLET LDA topic model as a web2py application... app_name = 20newsgroups_mallet app_path = apps/20newsgroups_mallet model_path = data/demo/20newsgroups/model-mallet corpus_filename = data/demo/20newsgroups/corpus/corpus.txt database_filename = data/demo/20newsgroups/corpus/corpus.db
Creating app: 20newsgroups_mallet [apps/temp_20140518_102959_056298_2262]
Creating folder: [apps/temp_20140518_102959_056298_2262/data]
Creating folder: [apps/temp_20140518_102959_056298_2262/databases]
Linking folder: [apps/temp_20140518_102959_056298_2262/models]
Linking folder: [apps/temp_20140518_102959_056298_2262/views]
Linking folder: [apps/temp_20140518_102959_056298_2262/controllers]
Linking folder: [apps/temp_20140518_102959_056298_2262/static]
Linking folder: [apps/temp_20140518_102959_056298_2262/modules]
Creating file: [apps/temp_20140518_102959_056298_2262/init.py]
Copying [data/demo/20newsgroups/corpus/corpus.db] --> [apps/temp_20140518_102959_056298_2262/data/corpus.db]
Copying [data/demo/20newsgroups/corpus/corpus.txt] --> [apps/temp_20140518_102959_056298_2262/data/corpus.txt]
Extracting [data/demo/20newsgroups/corpus/corpus.txt] --> [apps/temp_20140518_102959_056298_2262/data/sentences.txt]
An error occured while creating app: 20newsgroups_mallet [apps/20newsgroups_mallet]
Traceback (most recent call last):
File "bin/read_mallet.py", line 87, in
Ran into some of the same problems here.
- make sure you have curl installed in your distro (i.e. sudo apt-get install curl). If you don't then the line
curl --insecure --location http://homes.cs.washington.edu/~jcchuang/termite-datasets/$DEMO.zip > $DOWNLOAD_PATH/$DEMO.zip
located in fetch_dataset.sh cannot execute
- make sure the mallet, CoreNLP and gensim are well downloaded in the utils/tools. CoreNLP is quite large so it may take time.
- You can always check that the demo.zip can actually open and you can see the contents. Part of the problems I encountered were exactly that...the db wasn't being unzipped so the .txt file couldn't be created