lda2vec icon indicating copy to clipboard operation
lda2vec copied to clipboard

the result of preprocess

Open MelvinZang opened this issue 6 years ago • 18 comments

When I run preprocess.py in twenty_newsgroup, I get results like these

2 <SKIP> --> SKIP 4 , --> ÉÏ 5 . --> ÉÏ 13 - --> ÉÏ 15 ) --> ÉÏ 16 " --> ÉÏ 17 ( --> ÉÏ 19 : --> ÉÏ 24 ? --> ÉÏ 36 ' --> ÉÏ 43 / --> ÉÏ 49 ! --> ÉÏ 51 ; --> ÉÏ 61 < --> ÉÏ 76 ... --> §£.§£. 79 -- --> -4 90 ] --> ÉÏ 100 max>'ax>'ax>'ax>'ax>'ax>'ax>'ax>'ax>'ax>'ax>'ax>'ax>'ax>'ax --> [email protected] 108 [ --> ÉÏ 126 | --> ÉÏ 226 } --> ÉÏ 231 10 --> -0

I don't know what should I do to fix it,or it is the right results.

MelvinZang avatar Apr 13 '18 06:04 MelvinZang

I use the results to run lda2vec_run.py

First I get this result:

Top words in topic 0 x11 sci.crypt pixels copyright pixel meg siggraph moncton phil rpm Top words in topic 1 muslims steam christians communist filter playoffs terrorists indians filters macintosh Top words in topic 2 nuclear revolver housing mike galley cabin ulf sf braking argic Top words in topic 3 rbi reno ss canada bath apartment housing martin obey lindros Top words in topic 4 mph pitchers hitter modems cubs braking telescope velocity blues brakes Top words in topic 5 login dept militia customers 105 bombing abortion minorities workers americans Top words in topic 6 ill tip puck jersey updates tips reply offensive archives guard Top words in topic 7 sponsored rating inherently mode modes recommend voted p.o. p.m. participated Top words in topic 8 0.333 manager logo subscribe stats secretary dec detector archives saves Top words in topic 9 olwm dec gentiles azerbaijanis homosexuals liberal gays corps libertarians armenians Top words in topic 10 firearm revolver knife atrocities bullock suicide accidents snow flyers handgun Top words in topic 11 patents vs coverage v xv patent deals due warranty industry Top words in topic 12 los nowhere shift distinguish gulf direction movement massacre channel slaughter Top words in topic 13 microsoft msdos macintosh injection startup ken unix chinese cell pilot Top words in topic 14 homicides madison dec iraq murders msdos wolverine refugees archive obfuscated Top words in topic 15 edit login writers nejm moderator comics expressed msg copyright conclusions Top words in topic 16 whalers syndrome gods note gotten orbiter subscription rf march cds Top words in topic 17 chi subscribe noise digest ears section horn iron flow criteria Top words in topic 18 pm p.m. p.o. ss deletion microsoft edm verse powerpc disable Top words in topic 19 became ran rose grew stood pulled relations jumped fell remained Traceback (most recent call last): File "examples/twenty_newsgroups/lda2vec/lda2vec_run.py", line 107, in optimizer.zero_grads() AttributeError: 'Adam' object has no attribute 'zero_grads'

It is because my chainer version is 3.5.0 and the attribute 'zero_grads' is in the version below 2.0.0. The I change it to optimizer.use_cleargrads() (I'm not sure it is right or not). And then I get this

J:00561 E:00000 L:nan P:nan R:2.184e+04 J:00562 E:00000 L:nan P:nan R:1.826e+04 J:00563 E:00000 L:nan P:nan R:1.489e+04 Traceback (most recent call last): File "examples/twenty_newsgroups/lda2vec/lda2vec_run.py", line 94, in words) File "/media/data/users/master/2018/zangmingzhe/lda2vec/lda2vec/topics.py", line 76, in prepare_topics assert np.allclose(np.sum(topic_to_word, axis=1), 1), msg AssertionError: Not all rows in topic_to_word sum to 1

Does anybody know where the problem is?

MelvinZang avatar Apr 15 '18 04:04 MelvinZang

@MelvinZang the problem is due to the chainer version. change otpimizer.use_cleargrads.

ali3assi avatar May 04 '18 20:05 ali3assi

@TamouzeAssi thanks, it is right.

MelvinZang avatar May 06 '18 10:05 MelvinZang

@MelvinZang I am also running into this assert error:

Traceback (most recent call last): File "examples/hacker_news/lda2vec/lda2vec_run.py", line 87, in words) File "build/bdist.linux-x86_64/egg/lda2vec/topics.py", line 76, in prepare_topics AssertionError: Not all rows in topic_to_word sum to 1

I also had to switch to use use_cleargrads() instead of zero_grads() due to chainer version.

Were you able to fix the assert error: AssertionError: Not all rows in topic_to_word sum to 1

anupamme avatar May 09 '18 17:05 anupamme

@anupamme replace use_cleargrads() with model.cleargrads()

ali3assi avatar May 15 '18 13:05 ali3assi

@MelvinZang Hi! I'm wondering how long did it take you to run preprocess.py and run.py? Thanks!

lovedatatiff avatar May 17 '18 21:05 lovedatatiff

@lovedatatiff It takes me nearly a whole day to run preprocess.py. But it takes only a few hours to run lda2vec_run.py with GPU

MelvinZang avatar May 22 '18 08:05 MelvinZang

@anupamme another simple way is to change chainer version to 1.9.0

MelvinZang avatar May 22 '18 08:05 MelvinZang

hello @MelvinZang When I run lda2vec.py on my dataset, I get results like these

;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õ;²õž&»ÖLI»Íóõ»Yø<ªŒà‚<ß9;|•È»ÀšÓ»lë:œSs;ã@ºIæÖº“u»×¹±Ÿ°»ã$1:˜V"Œ.[£»dc Œj-!Œ¥KŒK<€òîºdqý;8x».蔻Ua,»p†ºm»†Ç;þeÀ;¶¡E»ý¶E¹(*׻ۇ›º‘ŠÊ:k±ŒÏ £;H÷’º[Ä;bbž:x:

plz tell,wats happening wrong here? i m stucked.. my dataset contains abstract.txt file (research papers abstracts data)

stalhaa avatar Dec 29 '18 08:12 stalhaa

I am also getting same error AttributeError: 'Adam' object has no attribute 'zero_grads'

has anyone been able to resolve this lately ?

pip show chainer Name: chainer Version: 6.0.0b1 Summary: A flexible framework of neural networks Home-page: https://chainer.org/

Edit : Solved by installing chainer==1.9.0

ranjeetkgupta avatar Dec 30 '18 03:12 ranjeetkgupta

@stalhaa The results I mentioned in the question are not wrong. It is a conversion process. Words that appear in articles the most are punctuations and the model changes them into something else. When the process continues, it goes normal. You can see plurals turns into singulars and other situation.

I don't understand the results you paste, maybe you can add format so that I can know when and why the model shows things like that.

Hope it helps.

MelvinZang avatar Jan 02 '19 02:01 MelvinZang

plz send me ur email address @MelvinZang .

stalhaa avatar Jan 02 '19 06:01 stalhaa

@MelvinZang . Did you get around the issue by installing chainer version 1.9.0 ? Well, for me it does solve the issue on my mac. But I am trying to setup this on a colab notebook (for gpu support) and unable to install chainer 1.9.0 . *

/tmp/tmpQdaB_J/a.cpp:1:10: fatal error: cudnn.h: No such file or directory #include <cudnn.h> ^~~~~~~~~ compilation terminated. ************************************************** *** WARNING: Include files not found: ['cudnn.h'] *** WARNING: Skip installing cudnn support *** WARNING: Check your CPATH environment variable ************************************************** cython path:/usr/local/lib/python2.7/dist-packages error: Command '/usr/bin/python2' failed:

  command: /usr/bin/python2 /usr/local/lib/python2.7/dist-packages/cython.py --fast-fail --verbose --cplus --directive profile=False --directive linetrace=False cupy/core/core.pyx
  return code: 1
  output:

Compiling /tmp/pip-build-tkCtKZ/chainer/cupy/core/core.pyx
/usr/local/lib/python2.7/dist-packages/Cython/Compiler/Main.py:367: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /tmp/pip-build-tkCtKZ/chainer/cupy/core/core.pyx
  tree = Parsing.p_module(s, pxd, full_module_name)

Error compiling Cython file:
------------------------------------------------------------
...
    void* data
    int size
    int shape_and_strides[MAX_NDIM * 2]


cdef class CArray(cupy.cuda.function.CPointer):
                                   ^
------------------------------------------------------------

cupy/core/carray.pxi:14:36: First base of 'CArray' is not an extension type

#################### and with latest version of chainer i get this error. AssertionError: Not all rows in topic_to_word sum to 1)

Really appreciate any insights here !

ranjeetkgupta avatar Jan 07 '19 09:01 ranjeetkgupta

@MelvinZang ??

stalhaa avatar Jan 16 '19 18:01 stalhaa

@stalhaa Sorry, I forgot. [email protected]

MelvinZang avatar Jan 17 '19 04:01 MelvinZang

@MelvinZang can u please run your lda2vec.py file code by applying my dataset file instead of twenty_newsgroup and share its results later on.? will u plz do it for me ? I want top words from 100 topics. Kindly help me in this regard.thanks.

stalhaa avatar Jan 21 '19 12:01 stalhaa

@stalhaa let me have a try

MelvinZang avatar Jan 21 '19 12:01 MelvinZang

My problem is that everytime I install Chainer 1.9.0 in place of a later version, my code can't

import cupy.cudnn

and this cause the

UserWarning: cuDNN is not enabled.

But if I don't switch to 1.9.0 and use a latest version, the

AttributeError: 'Adam' object has no attribute 'zero_grads'

happens. If zero_grads is replaced with use_cleargrads(use=False), use_cleargrads(use=True), use_cleargrads(), or model.cleargrads(), any of them,

AssertionError: Not all rows in topic_to_word sum to 1

shows.

ghost avatar Feb 14 '19 09:02 ghost