GuidedLDA icon indicating copy to clipboard operation
GuidedLDA copied to clipboard

getting 'too many indices for array' error when trying to print out topic results

Open tgrover2 opened this issue 5 years ago • 3 comments

Hi there,

I'm trying to run this program using my own data, and the actual guided topic modeling fit as expected, but now using your code to print out the resulting seeded topics:

n_top_words = 10 topic_word = model.topic_word_ for i, topic_dist in enumerate(topic_word): topic_words = np.array(vocab)[np.argsort(topic_dist)][:-(n_top_words+1):-1] print('Topic {}: {}'.format(i, ' '.join(topic_words)))

I am getting an error at topic_words = np.array(vocab)[np.argsort(topic_dist)][:-(n_top_words+1):-1] of IndexError: too many indices for array.

My vocab object is a python dictionary as expected with the word as the key and the value as the ID, like in your tutorial.

{'level': 23949, 'nationalsozialistische': 27680, 'boyish': 4847, 'uprising': 44406, 'reached': 34053, 'infinitesimal': 20852, 'humiliated': 19720, 'fundraise': 16348, 'reprogram': 35089, 'nwf': 28830, 'impolite': 20381, 'upmu': 44393, 'stomp': 40042, 'reassertion': 34162, 'matthjews': 25541, 'kokesh': 23156, 'seize': 37167, 'proven': 32956, 'rted': 36093, 'streams': 40190, 'jvx': 22572, 'deformation': 10161, 'schoolkids': 36798, 'agonising': 865, 'skellington': 38332, 'xvideos': 46943, 'hills': 19027, 'francoist': 15947, 'hitters': 19140, 'urination': 44472, 'crowdfund': 9114, 'fivethirtyeight': 15321, 'flagbearers': 15362, 'shoah': 37862, 'uncritically': 43738, 'heretics': 18837, 'congressional': 8097, 'slayin': 38487, 'kickerdaily': 22901, 'blogging': 4382, 'riot': 35685, 'consciously': 8154, 'attention': 2656, 'tik': 42227, 'pfft': 31040, 'steppe': 39913, 'eigene': 12762, 'drag': 12040, 'insectivore': 21073, 'premiere': 32308, 'outing': 29750, 'citizenry': 6985, 'repute': 35126, 'savvy': 36620, 'artfag': 2289, 'twinkies': 43330, 'supporting': 40785, 'escaped': 13642, 'shhiiiieeeetttt': 37692, 'yellow': 47058, 'rationality': 33954, 'sighting': 38107, 'negotiation': 27908, 'adults': 612, 'overflowing': 29884 etc, etc...

Any insight into what I might be missing here or doing wrong would be greatly appreciated. I am more experienced with R than python so I'm not used to all the nuances of python.

Thanks in advance!

tgrover2 avatar Oct 10 '18 08:10 tgrover2

worked for me by using vocab = cv.get_feature_names()

model = guidedlda.GuidedLDA(n_topics=10, n_iter=500, random_state=7, refresh=20) model.fit(X)

topic_word = model.topic_word_ n_top_words = 20 for i, topic_dist in enumerate(topic_word): topic_words = np.array(cv.get_feature_names())[np.argsort(topic_dist)][:-(n_top_words+1):-1] print('Topic {}: {}'.format(i, ' '.join(topic_words)))

deepakkumar98355 avatar Jul 29 '19 07:07 deepakkumar98355

Hi @deepakkumar98355 What does "cv" represent here? I know cv doesn't have a function"get_feature_names".

arthi-rajendran24 avatar Jan 23 '22 18:01 arthi-rajendran24

Hi @tgrover2 where you able to fix your issue? I am getting the same error. Please share your solution if you have any.

arthi-rajendran24 avatar Jan 23 '22 18:01 arthi-rajendran24