TopicGPT icon indicating copy to clipboard operation
TopicGPT copied to clipboard

indexEror

Open franck-nkolongo opened this issue 1 year ago • 2 comments

hello, I have a problem: reviews = list(review_data[2]) reviews = reviews[:5000] # only consider the first 5k reviews

IndexError: boolean index did not match indexed array along dimension 0; dimension is 5000 but corresponding boolean dimension is 1000.

this works with reviews = reviews[:1000]

franck-nkolongo avatar Sep 20 '24 02:09 franck-nkolongo

same here .. ` File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/topicgpt/TopicRepresentation.py:310, in extract_topics_no_new_vocab_computation(corpus, vocab, document_embeddings, clusterer, vocab_embeddings, n_topwords, topword_extraction_methods, consider_outliers) 306 dim_red_centroids = umap_mapper.transform(np.array(list(centroid_dict.values()))) # map the centroids to low dimensional space 308 dim_red_centroid_dict = {label: centroid for label, centroid in zip(centroid_dict.keys(), dim_red_centroids)} --> 310 word_topic_mat = extractor.compute_word_topic_mat(corpus, vocab, labels, consider_outliers = consider_outliers) # compute the word-topic matrix of the corpus 311 if "tfidf" in topword_extraction_methods: 312 tfidf_topwords, tfidf_dict = extractor.extract_topwords_tfidf(word_topic_mat = word_topic_mat, vocab = vocab, labels = labels, top_n_words = n_topwords) # extract the top-words according to tfidf

File ~/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/topicgpt/ExtractTopWords.py:308, in ExtractTopWords.compute_word_topic_mat(self, corpus, vocab, labels, consider_outliers) 305 word_topic_mat = np.zeros((len(vocab), len((np.unique(labels))))) 307 for i, label in tqdm(enumerate(np.unique(labels)), desc="Computing word-topic matrix", total=len(np.unique(labels))): --> 308 topic_docs = corpus_arr[labels == label] 309 topic_doc_string = " ".join(topic_docs) 310 topic_doc_words = word_tokenize(topic_doc_string)

IndexError: boolean index did not match indexed array along dimension 0; dimension is 6969 but corresponding boolean dimension is 4999 `

deepbot86 avatar Sep 22 '24 04:09 deepbot86

4999

I've found the solution, first you need to delete the directory (SaveEmeddings which includes the embeddings.pkl file). This file was initially made with 1000 data (in my case), in your case, you must have initially tried with a 4999 data set.

franck-nkolongo avatar Sep 23 '24 02:09 franck-nkolongo