Binary classifiers are accompanying us on a daily basis. Tests that detect disease, give us the answer: positive/negative, spam filters say spam/not spam, smartphones that authenticate us based on a face scan or fingerprint – make a known/unknown decision. The question: how to evaluate the efficiency of such a classifier does not seem extremely complicated. Just choose the one that will predict the most cases correctly. As many of us have already realized – the actual evaluation of a binary classifier requires somewhat more sophisticated means. But we’ll talk about that in a moment.
Continue reading “Meet P4 metric – new way to evaluate binary classifiers”
This article explores the extension of well-known F1 score used for assessing the performance of binary classifiers. We propose the new metric using probabilistic interpretation of precision, recall, specifcity, and negative predictive value. We describe its properties and compare it to common metrics. Then we demonstrate its behavior in edge cases of the confusion matrix. Finally, the properties of the metric are tested on binary classifier trained on the real dataset.
Keywords: machine learning, binary classifier, F1 , MCC, precision, recall
Continue reading “Extending F1 metric, probabilistic approach”
Have you ever wanted to develop a better intuition for measuring the performance of a binary classifier? Precision, recall, accuracy, specificity, F1… Now you have all these metrics under your fingers in the Performance Metrics Playground. You can control your population parameters – number of positive and negative samples, as well as the simulated classifier parameters – number of true positives and true negatives.
Continue reading “Binary classifier metrics”
In the following article, we will look at image recognition using linear regression. We realize that this idea may seem quite unusual. However, we will show using a simple example, that for a certain class of images, and under quite strictly defined circumstances, the linear regression method can achieve surprisingly fair results.
Continue reading “Image Recognition and Linear Regression”
Reverse Polish Notation is a method of notation of mathematical expressions that allows simple calculations to be performed without the need of using brackets – thanks to the use of a stack. This method has been popularized by Hewlett Packard, which has been successfully using it in its calculators for many years.
Continue reading “RPN Calculator”
When we want to know a standard deviation of a big population, we usually take a sample from the whole and than calculate estimator value. However it is not always clear which estimator should we use. Sometimes people argue whenever biased or unbiased standard deviation estimator is better. Below we explore this field and present the result of the numerical simulation.
Continue reading “Biased and unbiased estimators”
Jakiś czas temu, podczas porządkowania szafy wpadły mi w ręce, moje stare szpargały. Notatki z wykładów z mechaniki kwantowej, które to notatki jako student w latach 90-tych skrzętnie prowadziłem. Gdy już się nacieszyłem wspomnieniami zacząłem się zastanawiać czy nie dałoby się nieco poprawić ich wyglądu, oczyścić ze zbędnych elementów. Na każdej stronie widnieje niebiesko-blada kratka, dodatkowo pojawiają się przebitki atramentu z drugiej strony kartki. Widoczne są również otwory na wpięcie do segregatora.
Continue reading “Czy sztuczna inteligencja zna się na mechanice kwantowej?”
In the previous part we made look through the distribution of sample means for three distributions: Uniform, Cauchy, and Petersburg distribution. The Cauchy and Petersburg distributions do not fulfill the Central Limit Theorem since they have infinite variance (and infinite expected value in “Petersburg” case). Now we will have a look at the numerical results for standard deviation of sample means. As in previous part, we use Uniform distribution only as a reference since it fulfills CLT and we use the same pseud-random number generator (Mersenne-Twister).
Continue reading “The limits of central limit theorem – part 2”
The power of Central Limit Theorem is widely known. In the following post we are exploring a bit the areas outside its scope – where the CLT does not work. We present the results of numerical simulations for three distributions: Uniform, Cauchy distribution, and certain “naughty” distribution called later “Petersburg distribution”.
Continue reading “The limits of central limit theorem”