题目:Imbalanced Data: When It Becomes a "Pain" and How to "Ease" It 报告人: Charles Ling, PhD, Professor Director, Data Mining and e-Business Lab Department of Computer Science University of Western Ontario, Canada http://cling.csd.uwo.ca cling@csd.uwo.ca 时间: 4月23号 下午3点-4点 地点: 蒙民伟楼404会议室 摘要: In this talk I will first discuss and survey different situations where data i mbalance becomes a serious problem ("pain"). There are mainly two types of situations: one is that the misclassification co st is explicitly or implicitly assumed to be different; the cost of rare class is often much higher than the cost of the majority class. In this case, cost- sensitive learning, when used properly, can handle the problem. In the other s ituation, the misclassification cost is assumed equal, and an accurate classif ier, more accurate than the default classifier that predicts the majority clas s for all examples, is sought. This can be difficult or impossible for highly imbalanced data. Various methods have been proposed, but what are the capacity and limitations of various learning algorithms? We use PAC-learning to study these issues. We derive several bounds on the sam ple size that guarantee the overall error rate and the error rate of the rare class. 简历: Charles X. Ling earned his dual-BSc from Shanghai Jiao Tong Univ in China, and both of his MSc and PhD from Computer and Information Science at Univ of Penn sylvania (Ivy League) within four years. Since then he has been a faculty member in Computer Science at University of W estern Ontario, Canada. He is currently a Professor. His main research areas include machine learning and data mining, cognitive mo deling, and child education. He has published over 100 research papers in peer-reviewed journals and confer ences. He is an Associate Editor for IEEE TKDE and Computational Intelligence Journal , and IEEE Senior Member. He is the Director of Data Mining and E-Business Lab, leading data mining deve lopment in CRM, Bioinformatics, and the Internet.
|