GuidedLDA
GuidedLDA copied to clipboard
getting 'too many indices for array' error when trying to print out topic results
Hi there,
I'm trying to run this program using my own data, and the actual guided topic modeling fit as expected, but now using your code to print out the resulting seeded topics:
n_top_words = 10 topic_word = model.topic_word_ for i, topic_dist in enumerate(topic_word): topic_words = np.array(vocab)[np.argsort(topic_dist)][:-(n_top_words+1):-1] print('Topic {}: {}'.format(i, ' '.join(topic_words)))
I am getting an error at topic_words = np.array(vocab)[np.argsort(topic_dist)][:-(n_top_words+1):-1]
of IndexError: too many indices for array
.
My vocab object is a python dictionary as expected with the word as the key and the value as the ID, like in your tutorial.
{'level': 23949, 'nationalsozialistische': 27680, 'boyish': 4847, 'uprising': 44406, 'reached': 34053, 'infinitesimal': 20852, 'humiliated': 19720, 'fundraise': 16348, 'reprogram': 35089, 'nwf': 28830, 'impolite': 20381, 'upmu': 44393, 'stomp': 40042, 'reassertion': 34162, 'matthjews': 25541, 'kokesh': 23156, 'seize': 37167, 'proven': 32956, 'rted': 36093, 'streams': 40190, 'jvx': 22572, 'deformation': 10161, 'schoolkids': 36798, 'agonising': 865, 'skellington': 38332, 'xvideos': 46943, 'hills': 19027, 'francoist': 15947, 'hitters': 19140, 'urination': 44472, 'crowdfund': 9114, 'fivethirtyeight': 15321, 'flagbearers': 15362, 'shoah': 37862, 'uncritically': 43738, 'heretics': 18837, 'congressional': 8097, 'slayin': 38487, 'kickerdaily': 22901, 'blogging': 4382, 'riot': 35685, 'consciously': 8154, 'attention': 2656, 'tik': 42227, 'pfft': 31040, 'steppe': 39913, 'eigene': 12762, 'drag': 12040, 'insectivore': 21073, 'premiere': 32308, 'outing': 29750, 'citizenry': 6985, 'repute': 35126, 'savvy': 36620, 'artfag': 2289, 'twinkies': 43330, 'supporting': 40785, 'escaped': 13642, 'shhiiiieeeetttt': 37692, 'yellow': 47058, 'rationality': 33954, 'sighting': 38107, 'negotiation': 27908, 'adults': 612, 'overflowing': 29884
etc, etc...
Any insight into what I might be missing here or doing wrong would be greatly appreciated. I am more experienced with R than python so I'm not used to all the nuances of python.
Thanks in advance!
worked for me by using vocab = cv.get_feature_names()
model = guidedlda.GuidedLDA(n_topics=10, n_iter=500, random_state=7, refresh=20) model.fit(X)
topic_word = model.topic_word_ n_top_words = 20 for i, topic_dist in enumerate(topic_word): topic_words = np.array(cv.get_feature_names())[np.argsort(topic_dist)][:-(n_top_words+1):-1] print('Topic {}: {}'.format(i, ' '.join(topic_words)))
Hi @deepakkumar98355 What does "cv" represent here? I know cv doesn't have a function"get_feature_names".
Hi @tgrover2 where you able to fix your issue? I am getting the same error. Please share your solution if you have any.