Andrei Andreev comments

Results 169 comments of


                                            Andrei Andreev

KnowledgeCommunity: Content Bundle

> Can you create a tiny example with equal Ubuntu-server.iso filename and try get -numeric- clusters? With 18.04 and 18.10 together plus 22.04 and 22.10. Seems number signal is thrown...

KnowledgeCommunity: Content Bundle

In the previous example, elements were grouped fairly well, but there was a cluster containing elements close to noise (Cluster 4). To identify such a cluster (and filter it out...

KnowledgeCommunity: Content Bundle

Another metric that can be utilized for filtering clusters is the silhouette coefficient(the coefficient values range from -1 to 1). This metric provides insight into the distance between clusters and...

KnowledgeCommunity: Content Bundle

The best part of any job is the visualization: ![3d_all_clusters](https://github.com/Tribler/tribler/assets/13798583/f28ec73f-7cf0-43e1-9782-6c82125bbc89) ![3d_each_cluster](https://github.com/Tribler/tribler/assets/13798583/954f74d5-966c-4956-bb7f-ef8df0debb4e) ![2d_each_cluster](https://github.com/Tribler/tribler/assets/13798583/92bd8253-fb02-48d9-a790-947fd26cfb7b) The script: ```python # This script performs cluster analysis using the K-Means algorithm, applied to a multi-dimensional dataset....

KnowledgeCommunity: Content Bundle

Instead of integrating the current algorithm into Tribler, I decided to focus on its improvement and dedicate half of the current week to this task. I haven't yet focused on...

KnowledgeCommunity: Content Bundle

To achieve more specific clustering results, such as differentiating between clusters for "Ubuntu 20.04.X" instead of a more general "Ubuntu 20.04," the following HDBSCAN constructor parameters can be adjusted: `min_samples`...

KnowledgeCommunity: Content Bundle

The next step involved a deeper exploration of vectorization algorithms to determine if there are more advanced options beyond TFIDF that could better suit our needs. This exploration led us...

KnowledgeCommunity: Content Bundle

This endeavor was an attempt to leverage transformers, specifically the **BERT** model, as a tokenizer in our clustering process. When utilizing BERT as a tokenizer, we observe that it delivers...

KnowledgeCommunity: Content Bundle

The final improvement in this iteration was an attempt to modify the standard **TF-IDF** algorithm to account for the position of tokens, which led to better results (comparable to N-Grams...

CoreConnectTimeoutError: Could not connect with the Tribler Core within 120 seconds: ConnectionRefusedError (code 1)

Related to #7065