Faculty Research Labs

Yongdai Kim. Intellectual Data Exploration and Analysis Lab https://idea.snu.ac.kr

Introduction

The Research Office’s mission is to study statistical methodologies for complex data including high dimensional data, big data and censored data. Theoretical properties of various learning methods for complex data are investigated and efficient computational algorithms are developed. In particular, penalized approaches for high dimensional data and Bayesian models and computations for machine learning and survival analysis are current main research topics. In addition, applications of statistical methodologies to data from cohort studies, document analysis, bioinformatics and marketing are considered.

Fields of Interest

Statistical Learning and Data Mining
High-dimensional Data Analysis
Survival Analysis
Bayesian Statistics

Jisu Kim. Statistics and Artificial Intelligence with Topology Lab https://jkim82133.github.io

Introduction

Statistics and Artificial Intelligence with Topology Lab focuses its research on the statistical inference of Topological Data Analysis (TDA) and its applications to machine learning or data analysis. Topology, a branch of mathematics, explores how local components are globally interconnected, and TDA refers to utilizing these topological properties during data analysis. We conduct research on statistical methods and relevant theories for inferencing TDA from data under statistical models. To achieve this, we develop theories in computational topology that are suitable for inferring topological structures from data, and we also extend statistical theories to accommodate the geometrical and topological conditions required for TDA. We also concurrently explore practical applications of topological data analysis in machine learning or data analysis.

Fields of Interest

Topological Data Analysis
Statistical Inference on Topology and Geometry
Machine Learning Theory
Computational Topology
Clustering

Haeun Moon. Model-free Statistical Methods Lab https://sites.google.com/view/haeunmoon

Introduction

We study the development of model-free statistical procedures and their application to real data. Modern data often involve complex distributions that are difficult to specify in advance. We seek methods that can be applied without modeling the underlying relationships or the associated probability distributions. Topics include hypothesis testing, variable selection, predictive inference, threshold selection, and other important areas in statistics and data science. By applying these methods, we aim to make reliable and powerful discoveries from domain-specific datasets.

Fields of Interest

Independence Testing
Model-Free Analysis
Handling Missing Data
Omics Data Analysis

Gunwoong Park. Data Science & Machine Learning Lab https://sites.google.com/view/gwpark

Introduction

The DSML lab aims to develop statistical methodologies and tools applicable to various fields requiring decision-making. Hence, we study various topics such as reinforcement learning, recommendation systems, network analysis, graphical models, and causal inference. Particularly, we study their theoretical properties and optimal learning algorithms.

Fields of Interest

Causal inference
Graphical model learning
High-dimensional and robust learning
Network Analysis
Reinforcement learning

Byeong U. Park. Non-Parametric Inference Lab https://sites.google.com/view/theostat

Introduction

My current research areas include non- and semi-parametric inference on structured models for non-Euclidean data and its applications in various scientific domains. Non-Euclidean data are encountered everywhere due to the modern technology of data collection. Functional, compositional, spherical, shape and special-orthogonal-matrix-valued data, among others, are important examples of non-Euclidean data. Functional data refer to a collection of observed random functions that correspond to smooth realizations of an underlying stochastic process. Compositional data arise from numerous sources such as elections, compositions of body, air, sea-water, soil and income-expenditure distributions, etc. Spherical data, with circular data as a special case, emerge from earth science and astronomy, for example, such as the directions of wind and animal movement, and the positions of sunspots and airplanes, etc. A shape value is a set of finite points representing the shape of an artificial or natural object. Examples are shapes of skulls, organs, faces, sand-particles and lands. Some examples of special-orthogonal-matrix-valued data include vector-cardiograms and alignments of crystals. Non-and semi-parametric structured models are extremely useful statistical tools that can be used in a variety of modern real applications. They possess flexibility of nonparametric models, and at the same time circumvent very efficiently the curse of dimensionality of nonparametric models. There have been only a few attempts to develop methodology and related theory for these models that are applicable to non-Euclidean data. The case with ultra-high-dimensional data in the framework of nonparametric structured models is also in an opening stage. I am conducting comprehensive research on analyzing non-Euclidean and high-dimensional data based on nonparametric structured models.

Fields of Interest

Nonparametric structured models
Semiparametric inference
Non-Euclidean data analysis

Taesung Park. Bioinformatics and Biostatistics (BIBS) Lab http://bibs.snu.ac.kr

Introduction

Bioinformatics and Biostatistics (BIBS) Lab is managed by professor Taesung Park. Prof. Park received his Ph.D. degree for research in missing data and he published many excellent papers on repeated measurement analysis in statistical field. Based on the experience of statistical researches, he founded this lab to extend his research area to bioinformatics and biostatistics.
This lab has many publications of statistical methods especially in analysis of bioinformatics data that includes quality control analysis of microarray gene expression data, gene-gene interaction analysis of single nucleotide polymorphism (SNP) chip data in genome-wide scale (GWAS: Genome Wide Association Study), and gene-based analysis of next-generation sequencing (NGS) based data. For a specific example of microarray data analysis, prof. Park participated in selection of genes associated with breast cancer and development of diagnostic kit (Oncotype DX) of breast cancer recurrence. The corresponding paper was published to the new England journal of medicine (NEJM) (cited 3551 times) and the kit have been used for hundreds of thousands patients in 70 countries. In GWA studies, prof. Park have concentrated on gene-gene interaction analysis using multifactorial dimensionality reduction (MDR) and published about 10 corresponding papers (4 papers in “Bioinformatics”, 2 papers in “BMC” related journal, 1 paper in “Genetic epidemiology”), and extended the research field into gene-environment interaction analysis. For the NGS data analysis, the lab participates in the Type 2 Diabetes Genetic Exploration by Next-generation sequencing in Ethnic Samples (T2D GENES) consortium in which many leading research organizations such as U of Michigan and Oxford are included. In the consortium, our lab participates in developing statistical methods that can detect association between T2D related traits and genetic information, and several excellent papers are published, accepted, and being reviewed. Moreover, the lab has conducted many joint researches with domestic medical centers about liver, gastric, and pancreatic cancers. As a result, this lab detected novel biological pathways related to gastric cancer (published to “Oncogene” and “GUT”), applied for a patent of program that validates prediction performances through web-open transcriptomic dataset (CANcer-specific multi-marker Evaluation System: CANES), and constructed several databases for systematic analysis. BIBS has successively conducted many projects supervised by the government. Among the projects, the best representative is the program of National Creative Research Initiatives (NCRI). Based on the successful performances in the project of national research lab (NRL) from 2005 to 2010, by which Taesung Park published over 60 science citation index (SCI) papers, the lab was selected as a executing organization of NCRI from 2012 to 2015. It was the first selection as a statistical research lab, and Taesung Park and his students have published about 40 SCI papers funded by the project. Among them, 8 papers are included in the list of Journal Citation Report (JCR) upper 10% papers, and 25 papers in that of upper 20% papers. Based on the experiences, the lab will continue researches of bioinformatics and biostatistics, and especially extend research field into integrated analysis of omics data.

Fields of Interest

Repeated Measures data analysis
Missing data analysis
Microarray data analysis
Statistical Genetics
Gene-Gene interaction model
Integrated Omics Data analysis

Yei Eun Shin. Prediction Model Lab https://sites.google.com/view/yeieunshin

Introduction

We develop statistical methods for analyzing the risk of cancer and other health outcomes. Research interests are primarily in studying competing risks associated with complex factors such as physical activity, occupational & environmental exposures, and metabolic imbalance, which are challenging tasks with classical theories. We apply developed risk models to epidemiological cohorts of the real world, with particular interests in improving the estimation or validation of risk prediction models in two-phase cohort studies such as case-cohort and nested case-control designs.

Fields of Interest

Biostatistics
Survival Analysis
Spatiotemporal Statistics
Missing Data
Survey Sampling

Hee-Seok Oh. Multiscale Methods in Statistics Lab https://sites.google.com/view/snumultiscale

Introduction

Many of the phenomena and data observed in various fields of science and engineering are complex which cannot be handled by classical methods. It can be expected that the analysis of complex data by multiscale approaches reduces the complexity and enhances interpretability. The ultimate goal of our research lab is to extend the scope of statistics by coupling of multiscale methods and statistical modeling & inference.

Fields of Interest

Multiscale methods in statistics
Spatial-temporal data analysis
Statistical methods for climatology

Joong-Ho Won. Computational Statistics Labhttps://won-j.github.io

Introduction

We study computational methods for systemic investigation of interaction patterns emerging from high-throughput, large-scale data, which are largely coming from engineering, financial, and biomedical studies. We actively employ high-performance computing (HPC) methods in order to overcome the computational bottlenecks in analyzing these data of ever-increasing sizes.

Fields of Interest

High-performance statistical computing
Optimization – MM algorithms, proximal algorithms
Machine learning
Image processing

Kwonsang Lee. Causal Inference Labhttps://www.kwonsanglee.com

Introduction

The aim of this research lab is to extend causal inference theory and methods. From design to analysis and interpretation, causal reasoning is essential at every stage of research. We develop causal inference methods and apply them to different fields to provide important insight into a variety of issues. For example, we focus on discovering effect modification (or heterogeneous treatment effect) in an interpretable form. This discovery enhances our understanding of the treatment effect and enables efficient estimation of this effect. Such methods can solve causal inference problems arising from many other research areas such as medicine, public health, and social sciences.

Fields of Interest

Causal inference
Design and analysis of observational studies
Effect modification/Heterogeneous treatment effect
Sensitivity analysis
Application in medicine, public health and social sciences.

Sangyeol Lee. Time Series & Predictive Analytics Lab https://sites.google.com/snu.ac.kr/tspa

Introduction

This research laboratory studies in-depth the development of theories and its applications to stochastic process and time series analysis. Main subjects include financial time series analysis, financial engineering, risk management, change point analysis, empirical process and goodness of fit test, inference for stochastic differential equation model, extreme value theory and long memory process, insurance data analysis, and sequential analysis. Recently, a high dimensional and big data analysis together with social science data analysis is intensively emphasized. This research laboratory runs workshop for two selected subjects per semester. Individual researchers and research teams consisting of two or three members work on the aforementioned subjects and produce a number of research reports.

Fields of Interest

Financial time series analysis and risk management
Change point analysis and statistical process control
Environmental and healthcare statistics
Social science data and SNS analysis
Predictive analytics and machine learning

Jaeyong Lee. Bayesian Statistics Lab https://snubayes.org

Introduction

Bayesian inference was originated from the paper which Thomas Bayes, an amateur mathematician and a priest, wrote with an intention to prove the existence of God. The Bayesian inference is based on the probability distribution, called posterior distribution, which represents information with uncertainty. The Bayesian statistics is used in meteorology, medicine, engineering and artificial intelligence, to name just a few, for it can combine naturally the information with uncertainty. In this lab., we study many aspects of Bayesian statistics. The current interest lies on high-dimensional statistical models, differential equation models, Bayesian nonparametric models, and various application problems. In particular, we invent new statistical procedures, and study their theoretical properties and efficient computational methods.

Fields of Interest

Bayesian Statistics
High-dimensional statistical inference
Differential equation models.
Nonparametric Bayesian models

Johan Lim. Multivariate Statistics Lab https://sites.google.com/view/mvstat

Introduction

We study large-scale multivariate statistic analysis, which includes inference on high dimensional covariance matrix, probabilistic graphical model, and various latent variable models. We also do research on order related statistical inference

Fields of Interest

(Large scale) multivariate statistic analysis
Order related statistical inference
Latent variable models

Chae Young Lim. Spatial Statistics Lab https://limcstat.github.io

Introduction

Spatial Statistics lab leads various methodological and application-driven projects that deal with spatial and spatio-temporal data. Spatial Statistics is a sub-area of Statistics which study spatial data and spatio-temporal data by considering spatial/spatio-temporal uncertainty. It includes applications in various areas such as Public Health, Climatology/Environmetrics, Neuroscience, Biomedical engineering, Computer experiments, etc.

Fields of Interest

Spatial Statistics
Biomedical Engineering Analysis
Spectral Analysis-fixed domain asymptotics
Spatial Epidemiology-disease mapping

Won Chang. Uncertainty Quantification Lab https://www.wonchang.net

Introduction

The effective use of complex mathematical and machine learning models requires robust uncertainty quantification. Our goal is to develop statistically sound and scientifically valuable uncertainty quantification methods by leveraging advanced machine learning techniques, such as Gaussian processes and deep neural networks. Our research spans methodological developments in Bayesian deep learning models using variational inference and nonparametric Bayesian approaches for large spatial and temporal datasets, including Gaussian processes and Dirichlet process-Gaussian mixtures. We apply these methods across various fields, including climate change, disease modeling, and gene clustering.

Fields of Interest

Uncertainty Quantification
Machine Learning for Enviromnetal Research
Machine Learning for Biomedical Research
Spatio-Temporal Modeling

Woncheol Jang. High-dimensional Massive Data Analysis Lab

Introduction

The power of modern technology is opening a new era of massive and high dimensional data common in astronomy and bioinformatics, that is beyond the capabilities of traditional statistical methods. The size of the datasets affords us the opportunity to answer many open scientific questions but also presents some interesting challenges. Our research interests focus on nonparametric statistical inferences for solving complex scientific problems. On the theoretical side, we are interested in density estimation, functional data analysis, multiple testing, network model and variable selection. On the application side, our collaborations encompass such topics as astronomy, ecology, genomics and neuroscience.

Fields of Interest

Large-scale Inference
Multiple Testing
Social Network Analysis
Statistical Applications in Astronomy and Neuroscience

Sungkyu Jung. Statistical Learning Theory Lab https://statlet.github.io

Introduction

We study theory and methods for high-dimensional and geometric data. For analysis of high-dimension, low-sample size (HDLSS) data, we develop methods for dimension reduction, classification and nonparametric inference under spike models and/or sparse assumptions. HDLSS asymptotics reveals unique geometric properties of HDLSS data, and we attempt to extend the geometric representations of HDLSS data and use it in the development and evaluation of multivariate methods for such data. Moreover, for analysis of geometric data (data that come with special structures such as directions, shapes, or in general data on manifolds) we generalize classical modern multivariate methods, including those now dubbed statistical learning, originally devised with Euclidean geometry to non-Euclidean data in developing probability models, dimension reduction methods, and association analysis. This high-dimensional, structured, geometric/non-Euclidean data situation becomes increasingly relevant in bioinformatics, neuroscience, genetics, computer vision, and medical imaging, and we thrive to provide statistically-sound methodologies for these application areas.

Fields of Interest

High-dimension, low-sample-size problems
Analysis of structured, geometric and non-Euclidean data
Statistical learning

Junyong Park. High-dimensional Multiple Testing Labhttps://hdmtlab.github.io

Introduction

We are aiming at developing new methods and investigating their theoretical properties in the area of statistical hypothesis testing and inference, multiple testing, high dimensional classification and meta analysis. Based on these methods, we apply them to many scientific areas such as bioinformatics and medical science.

Fields of Interest

Hypothesis testing in high dimension
Multiple testing
Classification in high dimension
Bioinformatics
Meta Analysis