TY - JOUR
T1 - A community effort to optimize sequence-based deep learning models of gene regulation
AU - Rafi, Abdul Muntakim
AU - Nogina, Daria
AU - Penzar, Dmitry
AU - Lee, Dohoon
AU - Lee, Danyeong
AU - Kim, Nayeon
AU - Kim, Sangyeup
AU - Kim, Dohyeon
AU - Shin, Yeojin
AU - Kwak, Il Youp
AU - Meshcheryakov, Georgy
AU - Lando, Andrey
AU - Zinkevich, Arsenii
AU - Kim, Byeong Chan
AU - Lee, Juhyun
AU - Kang, Taein
AU - Vaishnav, Eeshit Dhaval
AU - Yadollahpour, Payman
AU - AiWei, Yang
AU - JingZhe, Wang
AU - JiaXing, Chen
AU - Schmitz, Carl
AU - Salomone, Robert
AU - Perrin, Dimitri
AU - Bradford, Jake
AU - Kumar S, Prasanna
AU - Gupta, Krishnakant
AU - Palaniappan, Ashok
AU - Malousi, Andigoni
AU - Kyriakidis, Konstantinos
AU - Kardamiliotis, Konstantinos
AU - Emrich, Scott
AU - Babjac, Ashley
AU - Queen, Owen
AU - Lu, Zhixiu
AU - Madsen, Jesper
AU - Kavaliauskaite, Gabija
AU - Møller, Andreas
AU - Soylemez, Onuralp
AU - Schubach, Max
AU - Dash, Pyaree Mohan
AU - Röner, Sebastian
AU - Li, Yichao
AU - Brors, Benedikt
AU - Feuerbach, Lars
AU - Körner, Cindy
AU - Abad, Nicholas
AU - Wan, Cen
AU - Barton, Carl
AU - Greaves, Patrick
AU - Random Promoter DREAM Challenge Consortium
N1 - Publisher Copyright:
© The Author(s) 2024.
PY - 2024
Y1 - 2024
N2 - A systematic evaluation of how model architectures and training strategies impact genomics model performance is needed. To address this gap, we held a DREAM Challenge where competitors trained models on a dataset of millions of random promoter DNA sequences and corresponding expression levels, experimentally determined in yeast. For a robust evaluation of the models, we designed a comprehensive suite of benchmarks encompassing various sequence types. All top-performing models used neural networks but diverged in architectures and training strategies. To dissect how architectural and training choices impact performance, we developed the Prix Fixe framework to divide models into modular building blocks. We tested all possible combinations for the top three models, further improving their performance. The DREAM Challenge models not only achieved state-of-the-art results on our comprehensive yeast dataset but also consistently surpassed existing benchmarks on Drosophila and human genomic datasets, demonstrating the progress that can be driven by gold-standard genomics datasets.
AB - A systematic evaluation of how model architectures and training strategies impact genomics model performance is needed. To address this gap, we held a DREAM Challenge where competitors trained models on a dataset of millions of random promoter DNA sequences and corresponding expression levels, experimentally determined in yeast. For a robust evaluation of the models, we designed a comprehensive suite of benchmarks encompassing various sequence types. All top-performing models used neural networks but diverged in architectures and training strategies. To dissect how architectural and training choices impact performance, we developed the Prix Fixe framework to divide models into modular building blocks. We tested all possible combinations for the top three models, further improving their performance. The DREAM Challenge models not only achieved state-of-the-art results on our comprehensive yeast dataset but also consistently surpassed existing benchmarks on Drosophila and human genomic datasets, demonstrating the progress that can be driven by gold-standard genomics datasets.
U2 - 10.1038/s41587-024-02414-w
DO - 10.1038/s41587-024-02414-w
M3 - Journal article
C2 - 39394483
AN - SCOPUS:85206852790
SN - 1087-0156
JO - Nature Biotechnology
JF - Nature Biotechnology
ER -