Loading Events

György Kovács works at the machine learning group at the Luleå University of Technology and will give a talk at IceLab about his research on Tuesday (Nov 20) at 11:00.

Despite the steady improvement of machine learning algorithms, the old adage remains true: your models are only as good as your data. This is hardly surprising. After all, even our brains (one of the most powerful learning tools we know) can be misled if presented with the wrong data. This does not necessarily mean false information, many other issues can plague a data set. One such issue is class imbalance, which occurs when the number of examples available for different classes are highly different. Unfortunately, most real-life classification tasks exhibit this phenomenon to varying degrees. And as we will see, it affects both the training and the evaluation (see: accuracy paradox) of our models. In my talk, after a brief introduction into machine learning, I will address this issue. First, by discussing alternative evaluation metrics to the widely used accuracy score. Then, by shortly describing some techniques to combat the effect of imbalanced data in the process of training, focusing on one particular technique, namely probabilistic sampling. I will follow this up by highlighting some examples where my colleagues and I successfully applied probabilistic sampling on a machine learning task.​

Share This Post