ML-Based Therapeutic Target Discovery
A central focus of my research is building frameworks for therapeutic target discovery that bridge human genetics, regulatory genomics, and machine learning.
By transferring sequence-to-function models into disease-relevant cellular contexts, we can infer gene regulatory networks, identify disrupted pathways and regulatory mechanisms, and mechanistically interpret disease-associated variants. By integrating statistical genetics (fine-mapping, QTL colocalization) with sequence-based deep learning, this approach enables the prioritization of therapeutic targets with interpretable regulatory mechanisms.
We applied this framework across multiple disease areas to uncover regulatory genes and pathways with strong translational potential.
Efficiency and Interpretability of Long-Context Models
As DNA sequence models increasingly adopt long-context architectures, computational demands have grown dramatically—training time has expanded from hours to days or months, while inference and interpretation have also become increasingly expensive.
I am interested in developing more efficient approaches across these components, including architectural innovations, training techniques, transfer strategies, and scalable model interpretation methods. These advances make it feasible to apply foundational models to new biological contexts and enable mechanistic interpretation at scale.
Sequence-Based Models of Chromatin Accessibility
Chromatin accessibility is particularly well-suited for short-context DNA sequence models. Unlike gene expression, accessibility signals can be accurately predicted from local sequence features. At the same time, accessibility remains a central modality for understanding the regulatory landscape and gene regulation.
My work focuses on developing models that carefully account for assay biases and technical artifacts, capture true regulatory signals in chromatin accessibility data, and link these signals to gene expression. This enables effective gene regulatory network inference and interpretation of regulatory variants.