Andrei Andreev

Results 169 comments of Andrei Andreev

> Can you create a tiny example with equal Ubuntu-server.iso filename and try get -numeric- clusters? With 18.04 and 18.10 together plus 22.04 and 22.10. Seems number signal is thrown...

In the previous example, elements were grouped fairly well, but there was a cluster containing elements close to noise (Cluster 4). To identify such a cluster (and filter it out...

Another metric that can be utilized for filtering clusters is the silhouette coefficient(the coefficient values range from -1 to 1). This metric provides insight into the distance between clusters and...

The best part of any job is the visualization: ![3d_all_clusters](https://github.com/Tribler/tribler/assets/13798583/f28ec73f-7cf0-43e1-9782-6c82125bbc89) ![3d_each_cluster](https://github.com/Tribler/tribler/assets/13798583/954f74d5-966c-4956-bb7f-ef8df0debb4e) ![2d_each_cluster](https://github.com/Tribler/tribler/assets/13798583/92bd8253-fb02-48d9-a790-947fd26cfb7b) The script: ```python # This script performs cluster analysis using the K-Means algorithm, applied to a multi-dimensional dataset....

Instead of integrating the current algorithm into Tribler, I decided to focus on its improvement and dedicate half of the current week to this task. I haven't yet focused on...

To achieve more specific clustering results, such as differentiating between clusters for "Ubuntu 20.04.X" instead of a more general "Ubuntu 20.04," the following HDBSCAN constructor parameters can be adjusted: `min_samples`...

The next step involved a deeper exploration of vectorization algorithms to determine if there are more advanced options beyond TFIDF that could better suit our needs. This exploration led us...

This endeavor was an attempt to leverage transformers, specifically the **BERT** model, as a tokenizer in our clustering process. When utilizing BERT as a tokenizer, we observe that it delivers...

The final improvement in this iteration was an attempt to modify the standard **TF-IDF** algorithm to account for the position of tokens, which led to better results (comparable to N-Grams...