|제목||[학술세미나] [특별세미나] 1월 4일(금) 11시 특별세미나 안내|
|내용||[특별세미나] 1월 4일(금) 11시 특별세미나 안내
제목 : Scalable and accurate analysis of big genetics and biomedical data
▪연사 : Seunggeun(Shawn) Lee (University of Michigan)
▪일시 : 2019년 1월 4일(금) AM 11:00 – 12:00
▪장소 : 25동 405호
With the advances in high-throughput technologies and rapid digitalization of health records, extremely large-scale genetics and biomedical data are now available for health research. The analysis of these big data requires scalable and accurate statistical/computational methods. In addition, these high-dimensional data pose important theoretical and methodological questions on commonly-used multivariate methods such as principal component analysis (PCA) and partial least squares (PLS). In this talk, I will introduce our recent work in this domain. I will first discuss the theoretical results of PCA and their application to other approaches such as surrogate variable analysis and PLS. I will particularly show that PLS can severely overfit high-dimensional data but a simple two-stage approach, motivated by theoretical results of PCA, can address the bias. In the second part, I will introduce a scalable and accurate method for biobank size data of > 100,000 samples, > 10 million genetic variants and > 1000 phenotypes. The new method, SAIGE, utilizes state-of-art optimization strategies to reduce computational time and memory cost of the generalized linear mixed model. In addition, it uses saddlepoint approximation (SPA) to provide accurate p-values even when case-control ratios are extremely unbalanced. The analysis UK-Biobank data of 400,000 samples confirm the good performance of the proposed method.
세미나 안내_190104_Seunggeun Lee.hwp [15KB]