Faculty Research Labs

Yongdai Kim. Intellectual Data Exploration and Analysis Lab https://idea-stat.snu.ac.kr/

Introduction

The Research Office’s mission is to study statistical methodologies for complex data including high dimensional data, big data and censored data. Theoretical properties of various learning methods for complex data are investigated and efficient computational algorithms are developed. In particular, penalized approaches for high dimensional data and Bayesian models and computations for machine learning and survival analysis are current main research topics. In addition, applications of statistical methodologies to data from cohort studies, document analysis, bioinformatics and marketing are considered.

Fields of Interest

Statistical Learning and Data Mining
High-dimensional Data Analysis
Survival Analysis
Bayesian Statistics

Byeong U. Park. Non-Parametric Inference Lab https://sites.google.com/view/theostat/

Introduction

My current research areas include non- and semi-parametric inference on structured models for non-Euclidean data and its applications in various scientific domains. Non-Euclidean data are encountered everywhere due to the modern technology of data collection. Functional, compositional, spherical, shape and special-orthogonal-matrix-valued data, among others, are important examples of non-Euclidean data. Functional data refer to a collection of observed random functions that correspond to smooth realizations of an underlying stochastic process. Compositional data arise from numerous sources such as elections, compositions of body, air, sea-water, soil and income-expenditure distributions, etc. Spherical data, with circular data as a special case, emerge from earth science and astronomy, for example, such as the directions of wind and animal movement, and the positions of sunspots and airplanes, etc. A shape value is a set of finite points representing the shape of an artificial or natural object. Examples are shapes of skulls, organs, faces, sand-particles and lands. Some examples of special-orthogonal-matrix-valued data include vector-cardiograms and alignments of crystals. Non-and semi-parametric structured models are extremely useful statistical tools that can be used in a variety of modern real applications. They possess flexibility of nonparametric models, and at the same time circumvent very efficiently the curse of dimensionality of nonparametric models. There have been only a few attempts to develop methodology and related theory for these models that are applicable to non-Euclidean data. The case with ultra-high-dimensional data in the framework of nonparametric structured models is also in an opening stage. I am conducting comprehensive research on analyzing non-Euclidean and high-dimensional data based on nonparametric structured models.

Fields of Interest

Nonparametric structured models
Semiparametric inference
Non-Euclidean data analysis

Taesung Park. Bioinformatics and Biostatistics (BIBS) Lab http://bibs.snu.ac.kr

Introduction

Bioinformatics and Biostatistics (BIBS) Lab is managed by professor Taesung Park. Prof. Park received his Ph.D. degree for research in missing data and he published many excellent papers on repeated measurement analysis in statistical field. Based on the experience of statistical researches, he founded this lab to extend his research area to bioinformatics and biostatistics.
This lab has many publications of statistical methods especially in analysis of bioinformatics data that includes quality control analysis of microarray gene expression data, gene-gene interaction analysis of single nucleotide polymorphism (SNP) chip data in genome-wide scale (GWAS: Genome Wide Association Study), and gene-based analysis of next-generation sequencing (NGS) based data. For a specific example of microarray data analysis, prof. Park participated in selection of genes associated with breast cancer and development of diagnostic kit (Oncotype DX) of breast cancer recurrence. The corresponding paper was published to the new England journal of medicine (NEJM) (cited 3551 times) and the kit have been used for hundreds of thousands patients in 70 countries. In GWA studies, prof. Park have concentrated on gene-gene interaction analysis using multifactorial dimensionality reduction (MDR) and published about 10 corresponding papers (4 papers in “Bioinformatics”, 2 papers in “BMC” related journal, 1 paper in “Genetic epidemiology”), and extended the research field into gene-environment interaction analysis. For the NGS data analysis, the lab participates in the Type 2 Diabetes Genetic Exploration by Next-generation sequencing in Ethnic Samples (T2D GENES) consortium in which many leading research organizations such as U of Michigan and Oxford are included. In the consortium, our lab participates in developing statistical methods that can detect association between T2D related traits and genetic information, and several excellent papers are published, accepted, and being reviewed. Moreover, the lab has conducted many joint researches with domestic medical centers about liver, gastric, and pancreatic cancers. As a result, this lab detected novel biological pathways related to gastric cancer (published to “Oncogene” and “GUT”), applied for a patent of program that validates prediction performances through web-open transcriptomic dataset (CANcer-specific multi-marker Evaluation System: CANES), and constructed several databases for systematic analysis. BIBS has successively conducted many projects supervised by the government. Among the projects, the best representative is the program of National Creative Research Initiatives (NCRI). Based on the successful performances in the project of national research lab (NRL) from 2005 to 2010, by which Taesung Park published over 60 science citation index (SCI) papers, the lab was selected as a executing organization of NCRI from 2012 to 2015. It was the first selection as a statistical research lab, and Taesung Park and his students have published about 40 SCI papers funded by the project. Among them, 8 papers are included in the list of Journal Citation Report (JCR) upper 10% papers, and 25 papers in that of upper 20% papers. Based on the experiences, the lab will continue researches of bioinformatics and biostatistics, and especially extend research field into integrated analysis of omics data.

Fields of Interest

Repeated Measures data analysis
Missing data analysis
Microarray data analysis
Statistical Genetics
Gene-Gene interaction model
Integrated Omics Data analysis

Hee-Seok Oh. Multiscale Methods in Statistics Lab https://sites.google.com/view/snumultiscale/

Introduction

Many of the phenomena and data observed in various fields of science and engineering are complex which cannot be handled by classical methods. It can be expected that the analysis of complex data by multiscale approaches reduces the complexity and enhances interpretability. The ultimate goal of our research lab is to extend the scope of statistics by coupling of multiscale methods and statistical modeling & inference.

Fields of Interest

Multiscale methods in statistics
Spatial-temporal data analysis
Statistical methods for climatology

Joong-Ho Won. Computational Statistics Labhttps://won-j.github.io/

Introduction

We study computational methods for systemic investigation of interaction patterns emerging from high-throughput, large-scale data, which are largely coming from engineering, financial, and biomedical studies. We actively employ high-performance computing (HPC) methods in order to overcome the computational bottlenecks in analyzing these data of ever-increasing sizes.

Fields of Interest

High-performance statistical computing
Optimization – MM algorithms, proximal algorithms
Machine learning
Image processing

Sangyeol Lee. Time Series & Predictive Analytics Lab https://sites.google.com/snu.ac.kr/tspa/

Introduction

This research laboratory studies in-depth the development of theories and its applications to stochastic process and time series analysis. Main subjects include financial time series analysis, financial engineering, risk management, change point analysis, empirical process and goodness of fit test, inference for stochastic differential equation model, extreme value theory and long memory process, insurance data analysis, and sequential analysis. Recently, a high dimensional and big data analysis together with social science data analysis is intensively emphasized. This research laboratory runs workshop for two selected subjects per semester. Individual researchers and research teams consisting of two or three members work on the aforementioned subjects and produce a number of research reports.

Fields of Interest

Financial time series analysis and risk management
Change point analysis and statistical process control
Environmental and healthcare statistics
Social science data and SNS analysis
Predictive analytics and machine learning

Jaeyong Lee. Bayesian Statistics Lab. https://snubayes.wordpress.com/

Introduction

Bayesian inference was originated from the paper which Thomas Bayes, an amateur mathematician and a priest, wrote with an intention to prove the existence of God. The Bayesian inference is based on the probability distribution, called posterior distribution, which represents information with uncertainty. The Bayesian statistics is used in meteorology, medicine, engineering and artificial intelligence, to name just a few, for it can combine naturally the information with uncertainty. In this lab., we study many aspects of Bayesian statistics. The current interest lies on high-dimensional statistical models, differential equation models, Bayesian nonparametric models, and various application problems. In particular, we invent new statistical procedures, and study their theoretical properties and efficient computational methods.

Fields of Interest

Bayesian Statistics
High-dimensional statistical inference
Differential equation models.
Nonparametric Bayesian models

Johan Lim. Multivariate Statistics Lab https://sites.google.com/view/mvstat

Introduction

We study large-scale multivariate statistic analysis, which includes inference on high dimensional covariance matrix, probabilistic graphical model, and various latent variable models. We also do research on order related statistical inference

Fields of Interest

(Large scale) multivariate statistic analysis
Order related statistical inference
Latent variable models

Chae Young Lim. Spatial Statistics Lab https://limcstat.github.io/

Introduction

Spatial Statistics lab leads various methodological and application-driven projects that deal with spatial and spatio-temporal data. Spatial Statistics is a sub-area of Statistics which study spatial data and spatio-temporal data by considering spatial/spatio-temporal uncertainty. It includes applications in various areas such as Public Health, Climatology/Environmetrics, Neuroscience, Biomedical engineering, Computer experiments, etc.

Fields of Interest

Spatial Statistics
Biomedical Engineering Analysis
Spectral Analysis-fixed domain asymptotics
Spatial Epidemiology-disease mapping

Woncheol Jang. High-dimensional Massive Data Analysis Lab.

Introduction

The power of modern technology is opening a new era of massive and high dimensional data common in astronomy and bioinformatics, that is beyond the capabilities of traditional statistical methods. The size of the datasets affords us the opportunity to answer many open scientific questions but also presents some interesting challenges. Our research interests focus on nonparametric statistical inferences for solving complex scientific problems. On the theoretical side, we are interested in density estimation, functional data analysis, multiple testing, network model and variable selection. On the application side, our collaborations encompass such topics as astronomy, ecology, genomics and neuroscience.

Fields of Interest

Large-scale Inference
Multiple Testing
Social Network Analysis
Statistical Applications in Astronomy and Neuroscience

Sungkyu Jung. Statistical Learning Theory Lab.https://statlet.github.io/

Introduction

We study theory and methods for high-dimensional and geometric data. For analysis of high-dimension, low-sample size (HDLSS) data, we develop methods for dimension reduction, classification and nonparametric inference under spike models and/or sparse assumptions. HDLSS asymptotics reveals unique geometric properties of HDLSS data, and we attempt to extend the geometric representations of HDLSS data and use it in the development and evaluation of multivariate methods for such data. Moreover, for analysis of geometric data (data that come with special structures such as directions, shapes, or in general data on manifolds) we generalize classical modern multivariate methods, including those now dubbed statistical learning, originally devised with Euclidean geometry to non-Euclidean data in developing probability models, dimension reduction methods, and association analysis. This high-dimensional, structured, geometric/non-Euclidean data situation becomes increasingly relevant in bioinformatics, neuroscience, genetics, computer vision, and medical imaging, and we thrive to provide statistically-sound methodologies for these application areas.

Fields of Interest

High-dimension, low-sample-size problems
Analysis of structured, geometric and non-Euclidean data
Statistical learning

Myunghee Cho Paik. Biostatistics Lab

Introduction

The Biostatistics Laboratory studies statistical methodologies to solve issues related to the field of medical sciences and public health. Our interest includes design and analysis of longitudinal data or clustered data using generalized estimating equations and mixed effects models, and inferential procedures for incomplete data and missing data. We recently started working in the area of digital health, precision medicine, and evidence-based medicine.

Fields of Interest

Statistical methods for epidemiology
Clustered data analysis
Longitudinal data analysis
Incomplete or missing data analysis
Digital health
Precision medicine
Evidence-based medicine

Junyong Park. High-dimensional Multiple Testing Lab

Introduction

We are aiming at developing new methods and investigating their theoretical properties in the area of statistical hypothesis testing and inference, multiple testing, high dimensional classification and meta analysis. Based on these methods, we apply them to many scientific areas such as bioinformatics and medical science.

Fields of Interest

Hypothesis testing in high dimension
Multiple testing
Classification in high dimension
Bioinformatics
Meta Analysis