题目:Learning from Imbalanced Data 报告人:Nitesh Chawla Assistant Professor Computer Science and Engineering Department, University of Notre Dame, USA 时间:10月29日(星期四) 14:30-15:30 地点:蒙民伟楼404会议室 摘要: Recent years brought increased interest in applying data mining techniques to difficult "real-world" problems, many of which are characterized by imbalanced learning data, where at least one class is under-represented relative to others. Examples include (but are not limited to): fraud/intrusion detection, risk management, medical diagnosis/monitoring, bioinformatics, text categorization and personalization of information. The problem of imbalanced data is also often associated with asymmetric costs of misclassifying elements of different classes. In this talk, I will present our work on finding problems in, proposing solutions to, and performing analysis on imbalanced data. 简历: Nitesh Chawla is an Assistant Professor in the Department of Computer Science and Engineering at the University of Notre Dame. He directs the Data Inference Analysis and Learning Lab (DIAL) and co-directs the Interdisciplinary Center of the Network Science and Applications (iCenSA) at Notre Dame. His research is primarily focused on machine learning, data mining, and complex networks. His work has led to applications in various domains including climate data sciences, biology, medicine, finance, security, and social science. He is on the editorial board of IEEE Transactions on Systems, Man and Cybernetics Part B, and has served/serving on the program and organizational committees for a number of top-tier conferences. He has received various awards and honors, including the best dissertation, best papers, outstanding undergraduate teacher, and the NAE New Faculty Fellowship. His current research is supported form NSF, DOD, NWICG, NIJ, and industry sponsors 题目:A framework for monitoring classifiers' performance: when and why failure occurs?
报告人:Nitesh Chawla Assistant Professor Computer Science and Engineering Department, University of Notre Dame, USA 时间:10月30日(星期五) 14:30-15:30 地点:蒙民伟楼404会议室 摘要: Classifier error is the product of model bias and data variance. While understanding the bias involved when selecting a given learning algorithm, it is similarly important to understand the variability in data over time, since even the One True Model might perform poorly when training and evaluation samples diverge. Thus, the ability to identify distributional divergence is critical towards pinpointing when fracture points in classifier performance will occur. Contemporary evaluation methods do not take the impact of distribution shifts on the quality of classifiers’ predictions. In this talk, I present a comprehensive framework to proactively detect breakpoints in classifiers’ predictions and shifts in data distributions through a series of statistical tests. I outline and utilize three scenarios under which data changes: sample selection bias, covariate shift, and shifting class priors. supported form NSF, DOD, NWICG, NIJ, and industry sponsors.
|