题目:Optimal Subsampling for Big Data: from ‘Static Data’ to ‘Data Streams’
报告人:艾明要 教授 (北京大学)
地点:致远楼101室
时间:2024年12月20日(周五)下午14:00-15:30
摘要:Subsampling methods are effective techniques to reduce computational burden and maintain statistical inference efficiency for big data. In this talk, we will review different subsampling techniques for efficiently dealing with different types of big data, not only for different inferential models from linear model, to generalized linear model, and to estimation equations, but also for different types of data from static data to dynamic data streams. To deal with the situation that the full data are stored in different blocks or at multiple locations, a distributed subsampling framework is developed, in which statistics are computed simultaneously on smaller partitions of the full data. Finally, the proposed strategies are illustrated and evaluated through numerical experiments on both simulated and real data sets.
欢迎广大师生前来参加