pubtrends
pubtrends copied to clipboard
Division by zero
To reproduce use predefined "brain computer interface" search from Pubmed.
[2021-10-14 08:34:35,747: INFO/ForkPoolWorker-1] Generating evolution topics descriptions
[2021-10-14 08:34:35,833: WARNING/ForkPoolWorker-1] /home/user/pysrc/papers/analysis/topics.py:116: RuntimeWarning: invalid value encountered in true_divide
tokens_freqs_per_comp = tokens_freqs_per_comp / tokens_freqs_norm
[2021-10-14 08:34:35,833: WARNING/ForkPoolWorker-1] /home/user/pysrc/papers/analysis/topics.py:123: RuntimeWarning: divide by zero encountered in log
adjusted_distance = distance.T * np.log(tokens_freqs_total)
@ctrltz is it possible to use np.log1p
to avoid this problem?
Sure, but if tokens_freqs_total
equals 0, I think it means that the whole corpus_counts
contains only zeros, and one might also separate this case implicitly like:
if not corpus_counts.sum():
return *empty descriptions here*
Did not keep evolution in mind when worked on the topics description, thanks for pointing it.
Also tokens_freqs_norm may be zero. What is correct fix for this?
As far as I understand, it means that some of the components have no corpus terms to be analyzed, so it would be correct to return an empty description for the respective components.
It might be simpler to plug in np.log1p
at the moment to ensure stability, and I can think a bit more in the coming days.
NB: I have also fixed the previous comment in case you have used it already.