全国统计与数据科学联合会议

国际知名统计学家、美国国家工程院院士，香港中文大学（深圳）数据科学学院校长学勤讲座教授。曾任密歇根大学H. C. Carver统计学讲座教授、佐治亚理工学院工业及系统工程系教授、可口可乐统计学讲座教授、1987 年获得考普斯会长奖（COPSS Presidents’ Award）,2011年获得费雪讲座奖（R.A. Fisher Lectureship）。1985年首创数据科学（Data Science）这一术语，并主张将统计学改名为数据科学，将统计学家称为数据科学家。

https://sds.cuhk.edu.cn/en/teacher/1900

报告时间Time：2025年7月11日9:00-9:50

报告地点Venue：杭州太虚湖假日酒店金色大厅

报告题目Report Title: Physics-informed learning and uncertainty quantification

摘要内容Abstract Content: Suppose a system or product can be described by its underlying physical knowledge, usually via a set of partial differential equations (pde’s). Then the corresponding AI tools must be informed by this knowledge. In this talk I will focus on using physical knowledge to “inform” the development of statistical/machine learning. I will use two recent work to illustrate how this can be done. In developing the injector design for rocket propulsion, the mixing of oxygen and fuel was critical for the stability of propulsion. Navier-Stokes equations were used to describe the mixing behavior. However, the solution of the equations could take six weeks on clustered machine. Thus only a small number of numerical solutions could be performed. A surrogate model was instead developed by using tools in uncertainty quantification (UQ) and machine learning (ML). It could be computed within one hour and can mimic the mixing behavior. The success lies in using UQ-ML tools to incorporate known physical phenomena about mixing. The second work addresses the issue of incorporating knowledge of the pde’s such as its boundary conditions. In building an efficient surrogate model, it must also satisfy the same boundary conditions. We have constructed a new class of Gaussian process models that incorporate the same boundary information. We develop a framework of GP models based on stochastic partial differential equations (SPDEs) with Dirichlet or Robin boundary conditions. For fast computation, we use a kernel regression approximation to accurately approximate the SPDE covariances. Real examples are used for illustration.

Keynote speaker 2：

James M. Robins, Harvard University

James M. Robins is an epidemiologist and biostatistician best known for advancing methods for drawing causal inferences from complex observational studies and randomized trials, particularly those in which the treatment varies with time. He is the 2013 recipient of the Nathan Mantel Award for lifetime achievement in statistics and epidemiology, and a recipient of the 2022 Rousseeuw Prize in Statistics, jointly with Miguel Hernán, Eric Tchetgen-Tchetgen, Andrea Rotnitzky and Thomas Richardson.

He graduated in medicine from Washington University in St. Louis in 1976. He is currently Mitchell L. and Robin LaFoley Dong Professor of Epidemiology at Harvard T.H. Chan School of Public Health. He has published over 100 papers in academic journals and is an ISI highly cited researcher.

https://www.hsph.harvard.edu/profile/james-m-robins/

报告时间Time：2025年7月11日10:10-11:00

报告地点Venue：杭州太虚湖假日酒店金色大厅

报告题目Report Title: Causal Inference : History, Contributions, and Future

摘要内容Abstract Content: Forty years ago, the following disciplines had their own languages, opinions and idiosyncrasies re causal inference: philosophy, computer science, sociology, psychology, statistics, epidemiology, political science, and economics. Today all speak a common language so new methodologies rapidly cross fertilize. Top journals have gone from knee-jerk rejection to active solicitation of articles in the area. The rapid development of the field has been driven by:

1.End of the historical suppression of causal language in statistics and medicine (aside from randomized clinical trials)

2.The internet making cross disciplinary understanding and collaboration easy

3.The need for individualized treatment regimes in Medicine

4.Tech companies realizing that optimizing profits depended on causal interventions rather than just prediction

5.The development of causal graphs by Spirtes, Glymour, Scheines and Pearl that offers non-technical users the ability to validly reason about complex causal systems

6.The existence of huge data sets leading to data driven science rather than hypothesis driven science

In my lecture, I will give a history of statistical methods for causal inference, focusing on methods developed by myself and colleagues. I will explain why the causal methods we developed for the analysis of time varying treatments have had such a large impact for now over 25 years on substantive areas in which confounding by time varying covariates is very strong, as in studies of HIV-infected individuals. In addition, I will describe why these methods are an integral part of the target trial methodology introduced by Miguel Hernan and myself - a methodology that is altering the analytical paradigm for the estimation of causal effects from longitudinal observational data in Medicine.

In more detail, I will review both (i) the role of marginal structural models, structural nested models, and the g-formula in modelling the effects of time-varying treatments and (ii) the development, joint with Andrea Rotnitzky, of doubly and multiply robust estimation of the model parameters. This will be followed by a brief review of ground-breaking causal methods developed by other researchers, centering on the development of proximal inference by Eric Tchetgen Tchetgen and Wang Miao and the contributions of Mark van der Laan. I will conclude with a discussion of the future of causal inference in the coming age of AI.

Keynote speaker 3：

Tianxi Cai, Harvard University

Tianxi Cai is a major player in developing analytical tools for mining EHR data and predictive modeling with biomedical data. She provides statistical leadership on several large-scale projects, including the NIH-funded Undiagnosed Diseases Network at DBMI. Cai's research lab develops novel statistical and machine learning methods for several areas including clinical trials, real world evidence, and personalized medicine using genomic and phenomic data. Cai received her ScD in Biostatistics at Harvard and was an assistant professor at the University of Washington before returning to Harvard as a faculty member in 2002.Robust Consensus Learning with Multi-Institutional EHR Data: Opportunities and Challenges

https://www.hsph.harvard.edu/profile/tianxi-cai/

报告时间Time：2025年7月11日11:00-11:50

报告地点Venue：杭州太虚湖假日酒店金色大厅

报告题目Report Title: Robust Consensus Learning with Multi-Institutional EHR Data: Opportunities and Challenges

摘要内容Abstract Content: The widespread adoption of electronic health records (EHR) has unlocked access to large-scale clinical datasets, creating valuable opportunities for discovery research and translational medicine. When integrated with biorepositories and other knowledge sources, EHR data can drive the development of real-world, data-driven models for precision healthcare. However, effectively utilizing multi-institutional EHR data presents considerable challenges, including data heterogeneity, privacy constraints, and temporal shifts in clinical practices, coding standards, and patient demographics. In this talk, I will explore consensus learning and representation learning strategies that harness knowledge sources and pre-trained language models to enhance robustness across diverse healthcare systems and evolving temporal trends. By integrating EHR data from multiple institutions, these approaches improve the reliability and generalizability of insights derived from distributed EHR data. I will illustrate these methods using datasets from institutions such as Mass General Brigham and the Veterans Health Administration, highlighting strategies to ensure the stability and applicability of real-world evidence in healthcare research.

Keynote speaker 4：

鄂维南，北京大学

国际知名数学家，中国科学院院士、北京大学讲席教授、大数据分析与应用技术国家工程实验室主任、北京大学国际机器学习研究中心主任、北京科学智能研究院院长、北京大数据研究院院长、中国工业与应用数学学会顾问、武汉数学与智能研究院学术委员会主任。曾任普林斯顿大学数学系和应用数学及计算数学研究所教授。

主要从事计算数学、应用数学、机器学习及其在力学、物理、化学和材料科学等领域中的应用等方面的研究。

https://www.math.pku.edu.cn/jsdw/js_20180628175159671361/e_20180628175159671361/138270.htm

报告时间Time：2025年7月12日8:30-9:20

报告地点Venue：杭州太虚湖假日酒店金色大厅

报告题目Report Title: Towards an understanding of the principles behind deep learning

摘要内容Abstract Content:The field of deep learning is evolving rapidly, driven by the availability of the vast amount of data and computing resources. Deep learning techniques have also evolved in several different ways, including different formulations such GAN and the diffusion model, different architecture such as CNN and transformers, and different training protocols such as BERT and GPT. This evolution has largely been empirical. Consequently there are a lot mysteries, surprises and “black magics” in this field. Is it possible to decipher some kind of guiding principles behind this? In this talk, we will discuss our thoughts along this line. Specifically, we will discuss how simple mathematical concepts such as symmetry and stability can be used as guiding principles for designing and understanding neural network models.

Keynote speaker 5：

Jian Huang，The Hong Kong Polytechnic University

Jian Huang is Chair Professor of Data Science and Analytics in the Departments of Data Science and AI, and Applied Mathematics at The Hong Kong Polytechnic University. He received his Ph.D. in Statistics from the University of Washington, Seattle.

His current research interests include deep learning, generative models, representation learning, leveraging large models for statistical analysis, and AI for science. He is recognized for his contributions to high-dimensional statistics, biostatistics, bioinformatics, and machine learning. He has been named as a highly cited researcher in Mathematics by Clarivate and is listed among the top 2% of the world’s most cited scientists by Stanford University. Professor Huang is a Fellow of the American Statistical Association and the Institute of Mathematical Statistics.

https://www.polyu.edu.hk/ama/people/academic-staff/prof-huang-jian/

报告时间Time：2025年7月12日9:20-10:10

报告地点Venue：杭州太虚湖假日酒店金色大厅

报告题目Report Title: Statistics in the Era of Large Models

摘要内容Abstract Content: The emergence of large-scale machine learning models, such as deep neural networks and foundation models, is fundamentally reshaping the field of statistics. In this talk, we explore how these powerful models can be leveraged to address contemporary statistical challenges. We will discuss applications such as employing large language model-based data agents for coding-free data analysis, leveraging large models to enhance statistical analysis, and applying generative models to nonparametric statistical learning problems. Through these examples, we will highlight the opportunities and challenges that large models present for statistical analysis and outline promising directions for future research at this dynamic intersection.

Keynote speaker 6：

Will Cong, Cornell University

Will Cong is the Rudd Family Professor of Management, professor of finance, and founding director of the FinTech Initiative at Cornell University. He is also a finance editor at the Management Science, faculty scientist at the Initiative for Cryptocurrencies & Contracts (IC3), research associate at the NBER, founder of multiple international research forums, a former Kauffman Junior Fellow, Poets & Quants World Best Business School Professor, and 2022 Top 10 Quant Professor.

Cong's research spans financial economics, information economics, fintech, digital economy, and entrepreneurship. He and his coauthors have pioneered the introduction of goal-oriented search and interpretable AI for finance, laid the foundations of tokenomics (covering categorization of tokens, cryptocurrency pricing, central bank digital currencies/payment systems, and optimal token monetary policy design), analyzed centralization issues and dynamic incentives in blockchains and DeFi, and developed data analytics for detecting market manipulation and better fintech regulation among others.

https://business.cornell.edu/faculty-research/faculty/lc898/

报告时间Time：2025年7月12日10:30-11:20

报告地点Venue：杭州太虚湖假日酒店金色大厅

报告题目Report Title: Generative AI for Economic Equilibrium Analyses and Financial Applications

摘要内容Abstract Content: I overview two core themes of modern AI: goal-oriented end-to-end optimization and generative modeling with deep learning. I then discuss non-text-based generative modeling involving transformer-based reinforcement learning or the novel panel trees for portfolio management, test asset creation, and detecting heterogeneity groups (e.g., asset clusters with differential return predictability). Finally, I introduce the concept of data-driven generative equilibrium for counterfactual analyses in economics, with an application to online lending markets.

大会报告

联系我们