XRP vs BTC
Bitcoin transaction confirmations can take minutes or hours and are typically associated with high transaction costs…
In unsupervised learning, the labeling information for training samples is unknown, and the goal is to uncover the nature and patterns of the data by examining the unlabeled training samples and to provide a basis for further data analysis
2021-11-21, by Ted Jackman, Independent Financial Adviser
#Dunn Index || #Clasterization || #Data Analytics ||
The Dunn Index (DI) (introduced by J.K.Dunn in 1974) is a metric for evaluating clustering algorithms. It is part of a group of confidence indices, including the Davis-Buldin index or the Silhouette index, since it is an internal scoring scheme in which the result is based on the clustered data itself.
The problem of assessing the quality in the clustering problem is intractable at least for two reasons, like mentioned here: https://python.org/dunn-index-and-db-index-cluster-validity-indices-set-1/.
Various performance metrics are used to evaluate different machine learning algorithms. In the case of a classification problem, we have various performance metrics to gauge how good our model is. For cluster analysis, a similar question is how to evaluate the "quality factor" of the resulting clusters?
Why do we need cluster validity indices?
As a rule, cluster confidence measures are subdivided into 3 classes, they are:
In unsupervised learning, the labeling information for training samples is unknown, and the goal is to uncover the nature and patterns of the data by examining the unlabeled training samples and to provide a basis for further data analysis. Clustering is the most widely used.
Clustering attempts to split the samples in a dataset into several generally disjoint subsets, and each subset is called a cluster.
As a separate process, clustering skills are used to find the internal structure of the distribution of data, and can also be used as a precursor for other learning tasks such as classification.
Two main problems with clustering algorithms: measuring performance and calculating distance
The clustering performance metric has also become a validity metric that is similar to the supervised learning performance metric. For the clustering result, to assess its quality, it is necessary to use a certain efficiency indicator. On the other hand, if the performance metric that will ultimately be used is clear, you can directly use it as a target for optimizing the clustering process to get the best clustering results that meet the requirements.
The clustering result has high "intra-cluster similarity" and "low inter-cluster similarity".
There are roughly two types of clustering performance metrics: one is comparing the clustering result to a specific “reference model” called an external indicator, and the other is to directly test the clustering result of an exam room without using any reference model called internal.