Chapman University Digital Commons

Home > Dissertations and Theses > Computational and Data Sciences (PhD) Dissertations

Computational and Data Sciences (PhD) Dissertations

Below is a selection of dissertations from the Doctor of Philosophy in Computational and Data Sciences program in Schmid College that have been included in Chapman University Digital Commons. Additional dissertations from years prior to 2019 are available through the Leatherby Libraries' print collection or in Proquest's Dissertations and Theses database.

Dissertations from 2023 2023

Computational Analysis of Antibody Binding Mechanisms to the Omicron RBD of SARS-CoV-2 Spike Protein: Identification of Epitopes and Hotspots for Developing Effective Therapeutic Strategies , Mohammed Alshahrani

Integration of Computer Algebra Systems and Machine Learning in the Authoring of the SANYMS Intelligent Tutoring System , Sam Ford

Voluntary Action and Conscious Intention , Jake Gavenas

Random Variable Spaces: Mathematical Properties and an Extension to Programming Computable Functions , Mohammed Kurd-Misto

Computational Modeling of Superconductivity from the Set of Time-Dependent Ginzburg-Landau Equations for Advancements in Theory and Applications , Iris Mowgood

Application of Machine Learning Algorithms for Elucidation of Biological Networks from Time Series Gene Expression Data , Krupa Nagori

Stochastic Processes and Multi-Resolution Analysis: A Trigonometric Moment Problem Approach and an Analysis of the Expenditure Trends for Diabetic Patients , Isaac Nwi-Mozu

Applications of Causal Inference Methods for the Estimation of Effects of Bone Marrow Transplant and Prescription Drugs on Survival of Aplastic Anemia Patients , Yesha M. Patel

Causal Inference and Machine Learning Methods in Parkinson's Disease Data Analysis , Albert Pierce

Causal Inference Methods for Estimation of Survival and General Health Status Measures of Alzheimer’s Disease Patients , Ehsan Yaghmaei

Dissertations from 2022 2022

Computational Approaches to Facilitate Automated Interchange between Music and Art , Rao Hamza Ali

Causal Inference in Psychology and Neuroscience: From Association to Causation , Dehua Liang

Advances in NLP Algorithms on Unstructured Medical Notes Data and Approaches to Handling Class Imbalance Issues , Hanna Lu

Novel Techniques for Quantifying Secondhand Smoke Diffusion into Children's Bedroom , Sunil Ramchandani

Probing the Boundaries of Human Agency , Sook Mun Wong

Dissertations from 2021 2021

Predicting Eye Movement and Fixation Patterns on Scenic Images Using Machine Learning for Children with Autism Spectrum Disorder , Raymond Anden

Forecasting the Prices of Cryptocurrencies using a Novel Parameter Optimization of VARIMA Models , Alexander Barrett

Applications of Machine Learning to Facilitate Software Engineering and Scientific Computing , Natalie Best

Exploring Behaviors of Software Developers and Their Code Through Computational and Statistical Methods , Elia Eiroa Lledo

Assessing the Re-Identification Risk in ECG Datasets and an Application of Privacy Preserving Techniques in ECG Analysis , Arin Ghazarian

Multi-Modal Data Fusion, Image Segmentation, and Object Identification using Unsupervised Machine Learning: Conception, Validation, Applications, and a Basis for Multi-Modal Object Detection and Tracking , Nicholas LaHaye

Machine-Learning-Based Approach to Decoding Physiological and Neural Signals , Elnaz Lashgari

Learning-Based Modeling of Weather and Climate Events Related To El Niño Phenomenon via Differentiable Programming and Empirical Decompositions , Justin Le

Quantum State Estimation and Tracking for Superconducting Processors Using Machine Learning , Shiva Lotfallahzadeh Barzili

Novel Applications of Statistical and Machine Learning Methods to Analyze Trial-Level Data from Cognitive Measures , Chelsea Parlett

Optimal Analytical Methods for High Accuracy Cardiac Disease Classification and Treatment Based on ECG Data , Jianwei Zheng

Dissertations from 2020 2020

Development of Integrated Machine Learning and Data Science Approaches for the Prediction of Cancer Mutation and Autonomous Drug Discovery of Anti-Cancer Therapeutic Agents , Steven Agajanian

Allocation of Public Resources: Bringing Order to Chaos , Lance Clifner

A Novel Correction for the Adjusted Box-Pierce Test — New Risk Factors for Emergency Department Return Visits within 72 hours for Children with Respiratory Conditions — General Pediatric Model for Understanding and Predicting Prolonged Length of Stay , Sidy Danioko

A Computational and Experimental Examination of the FCC Incentive Auction , Logan Gantner

Exploring the Employment Landscape for Individuals with Autism Spectrum Disorders using Supervised and Unsupervised Machine Learning , Kayleigh Hyde

Integrated Machine Learning and Bioinformatics Approaches for Prediction of Cancer-Driving Gene Mutations , Oluyemi Odeyemi

On Quantum Effects of Vector Potentials and Generalizations of Functional Analysis , Ismael L. Paiva

Long Term Ground Based Precipitation Data Analysis: Spatial and Temporal Variability , Luciano Rodriguez

Gaining Computational Insight into Psychological Data: Applications of Machine Learning with Eating Disorders and Autism Spectrum Disorder , Natalia Rosenfield

Connecting the Dots for People with Autism: A Data-driven Approach to Designing and Evaluating a Global Filter , Viseth Sean

Novel Statistical and Machine Learning Methods for the Forecasting and Analysis of Major League Baseball Player Performance , Christopher Watkins

Dissertations from 2019 2019

Contributions to Variable Selection in Complexly Sampled Case-control Models, Epidemiology of 72-hour Emergency Department Readmission, and Out-of-site Migration Rate Estimation Using Pseudo-tagged Longitudinal Data , Kyle Anderson

Bias Reduction in Machine Learning Classifiers for Spatiotemporal Analysis of Coral Reefs using Remote Sensing Images , Justin J. Gapper

Estimating Auction Equilibria using Individual Evolutionary Learning , Kevin James

Employing Earth Observations and Artificial Intelligence to Address Key Global Environmental Challenges in Service of the SDGs , Wenzhao Li

Image Restoration using Automatic Damaged Regions Detection and Machine Learning-Based Inpainting Technique , Chloe Martin-King

Theses from 2017 2017

Optimized Forecasting of Dominant U.S. Stock Market Equities Using Univariate and Multivariate Time Series Analysis Methods , Michael Schwartz

  • Collections
  • Disciplines

Advanced Search

  • Notify me via email or RSS

Author Corner

  • Submit Research
  • Rights and Terms of Use
  • Leatherby Libraries
  • Chapman University

ISSN 2572-1496

Home | About | FAQ | My Account | Accessibility Statement

Privacy Copyright

DigitalCommons@Kennesaw State University

Home > CCSE > Data Science and Analytics > PhD DSA

Doctor of Data Science and Analytics Dissertations

The PhD Website

The Ph.D. in Data Science and Analytics is an advanced degree with a dual focus of application and research - where students will engage in real world business problems, which will inform and guide their research interests.

We launched the first formal PhD program in Data Science in 2015. Our program sits at the intersection of computer science, statistics, mathematics, and business. Our students engage in relevant research with faculty from across our eleven colleges. As one of the institutions on the forefront of the development of data science as an academic discipline, we are committed to developing the next generation of Data Science leaders, researchers, and educators. Culturally, we are committed to the discipline of Data Science, through ethical practices, attention to fairness, to a diverse student body, to academic excellence, and research which makes positive contributions to our local, regional, and global community. -Sherry Ni, Director, Ph.D. in Data Science and Analytics

This degree will train individuals to translate and facilitate new innovative research, structured and unstructured, complex data into information to improve decision making. This curriculum includes heavy emphasis on programming, data mining, statistical modeling, and the mathematical foundations to support these concepts. Importantly, the program also emphasizes communication skills – both oral and written – as well as application and tying results to business and research problems.

Dissertations from 2023 2023

Quantification of Various Types of Biases in Large Language Models , Sudhashree Sayenju

Dissertations from 2022 2022

Appley: Approximate Shapley Values for Model Explainability in Linear Time , Md Shafiul Alam

Ethical Analytics: A Framework for a Practically-Oriented Sub-Discipline of AI Ethics , Jonathan Boardman

Novel Instance-Level Weighted Loss Function for Imbalanced Learning , Trent Geisler

Debiasing Cyber Incidents – Correcting for Reporting Delays and Under-reporting , Seema Sangari

Dissertations from 2021 2021

Integrated Machine Learning Approaches to Improve Classification performance and Feature Extraction Process for EEG Dataset , Mohammad Masum

A Distance-Based Clustering Framework for Categorical Time Series: A Case Study in Episodes of Care Healthcare Delivery System , Lauren Staples

Dissertations from 2020 2020

A CREDIT ANALYSIS OF THE UNBANKED AND UNDERBANKED: AN ARGUMENT FOR ALTERNATIVE DATA , Edwin Baidoo

Quantitatively Motivated Model Development Framework: Downstream Analysis Effects of Normalization Strategies , Jessica M. Rudd

Data-driven Investment Decisions in P2P Lending: Strategies of Integrating Credit Scoring and Profit Scoring , Yan Wang

A Novel Penalized Log-likelihood Function for Class Imbalance Problem , Lili Zhang

ATTACK AND DEFENSE IN SECURITY ANALYTICS , Yiyun Zhou

Dissertations from 2019 2019

One and Two-Step Estimation of Time Variant Parameters and Nonparametric Quantiles , Bogdan Gadidov

Biologically Interpretable, Integrative Deep Learning for Cancer Survival Analysis , Jie Hao

Deep Embedding Kernel , Linh Le

Ordinal HyperPlane Loss , Bob Vanderheyden

Advanced Search

  • Notify me via email or RSS
  • Submit Research
  • All Collections
  • Disciplines
  • Conferences
  • Faculty Works
  • Open Access
  • Research Support
  • Student Works
  • Data Science Homepage

Useful Links

  • Training Materials
  • SelectedWorks Login

Home | About | FAQ | My Account | Accessibility Statement

Privacy Copyright DigitalCommons@Kennesaw State University ISSN: 2576-6805

  • Online Degree Explore Bachelor’s & Master’s degrees
  • MasterTrack™ Earn credit towards a Master’s degree
  • University Certificates Advance your career with graduate-level learning
  • Top Courses
  • Join for Free

Getting a PhD in Data Science: What You Need to Know

A PhD in data science prepares you for some of the most cutting-edge research in the field and can advance your career. But, whether you should pursue one depends on your own personal goals and resources. Learn more inside.

[Featured Image]:  A candidate for a PhD degree in Data Science, is sitting at her desk, working on her laptop computer.

A Doctor of Philosophy (PhD) is the highest degree that a professional can obtain in the field of data science. Focused primarily on equipping degree holders with the skills and knowledge required to conduct original research, a PhD prepares degree holders for advanced professional positions in both industry and academia. 

But, the path to obtaining a PhD is filled with many years of potentially costly study that can be discouraging to those looking for rapid career progression. Before jumping into a doctoral program, then, it’s important to define what your goals are and how a PhD may (or may not) fit into them. 

In this article, you’ll learn more about PhDs in data science, the different factors you should consider before joining one, and types of programs to consider. At the end, you’ll also find some suggested online courses to help you get started today. 

PhD in Data Science: Overview 

A Doctor of Philosophy (PhD) is the terminal degree in the field of data science, meaning it is the highest possible degree that can be obtained in the subject. Holding a PhD in data science, consequently, signals your mastery and knowledge of the field to both potential employers and fellow professionals. 

At a glance, here’s what you should know about a Data Science PhD: 

PhD vs. Master’s Degree in Data Science

There are two graduate degrees in the field of data science: a master’s in Data science and a PhD in Data Science. While both of these degrees can have a beneficial impact on your job prospects, they also have key differences that might impact which one is better for you. 

A Master’s in Data Science is a graduate degree between a bachelor’s and PhD, which usually takes between one and two years to complete. A master’s degree expands on what was learned in undergraduate school through more advanced courses in topics such as machine learning, data analytics, and statistics. Often, a master’s student in data science also pursues original research and completes a capstone project, which highlights what they learned in their program.

A PhD in Data Science is a research degree that typically takes four to five years to complete but can take longer depending on a range of personal factors. In addition to taking more advanced courses, PhD candidates devote a significant amount of time to teaching and conducting dissertation research with the intent of advancing the field. At the conclusion of their doctoral program, a PhD holder in Data Science will complete a dissertation representing a significant contribution to the field. 

Typically, bachelor’s degree holders entering a PhD program are able to earn their master’s degree as a part of their doctoral program. Those entering a master’s program, however, will usually have to apply for a PhD program even if it’s in the same department. 

Skills and curriculum 

Every PhD program is unique with its own requirements and focus. Nonetheless, they do have similar features, such as course, credit, and teaching requirements. To help you get a better understanding of how a doctoral graduate program in data science might be, here’s an example curriculum from NYU [ 1 ]: 

Complete 72 credit hours while maintaining a cumulative grade point average of 3.0 (out of 4.0) each semester.

Core courses in topics like probability, statistics, machine learning, big data, inference, and research. 

39 credit hours for elective courses in such topics as deep learning, natural language processing, and computational cognitive modeling. 

Complete teaching requirements.

Pass a comprehensive exam. 

Pass the Depth Qualifying Exam (DQE) by May 15 of their fourth semester. 

Complete all steps for approval of their PhD dissertation. 

Is a PhD in Data Science worth it? 

A PhD can open doors to new career opportunities and boost your employment prospects. But, it can also take a lot of time and money to complete. Everyone’s personal and professional goals are different, so consider these things when deciding if you should pursue a PhD in Data Science:  

Cost and time

The amount of time and money it takes to complete a PhD are perhaps the most concrete considerations one makes when deciding whether or not they should pursue a doctoral degree. According to research conducted by Education Data Initiative, the average cost of a doctorate degree is $114,300 and takes roughly four to eight years to complete [ 2 ]. 

The exact amount of time and money you might spend obtaining your doctoral degree will depend on your own circumstances and program. Before applying for a doctoral degree, make sure to review each program’s graduation requirements and costs, so you have a clear understanding of what you’re getting into. 

Data Science PhD salary 

While there are no official statistics on the salary gains data scientist earn by getting a PhD, the median salary for all data scientists is much higher than the national average in the United States. According to the U.S. Bureau of Labor Statistics (BLS), for example, the median salary for data scientists was $100,910 as of May 2021 [ 3 ]. 

Typically, the entry-level degree to get a data science position is a bachelor’s degree, meaning that even just an undergraduate degree could help you land a job that earns a higher than average salary. Nonetheless, a PhD will likely prepare you for more advanced positions that could offer higher pay than less specialized roles. 

Data Science PhD programs 

There are several types of doctoral programs that you might consider if you would like to obtain a PhD in data science. These include: 

PhD in data science online

An online PhD program may appeal to individuals who are interested in a more flexible program that allows them to complete their coursework at their own pace. Often, online programs can also be cheaper than their in-person counterparts, though they often offer less opportunities for networking and mentorship. If you’re an independent, self-starter looking for a program that can fit into their already busy life, then you might consider an online PhD program. 

PhD in data science in-person

An in-person PhD program is a more traditional, educational method in which you attend classes on campus with your peers and instructors. In addition to providing doctoral-level instruction, you will also have more opportunities to network and gain more personalized instruction than you will likely encounter through online programs. In-person programs tend to be more expensive and inflexible than in-person ones.

If you prefer real-world instruction, networking opportunities, and a more rigid structure, then you might consider an in-person doctoral program. 

Alternatives 

As an alternative to a PhD program, you might also consider obtaining a master’s degree. While covering some of the same material as a doctoral program, a master’s usually takes much less time and money to complete.

If you’re motivated primarily by the desire to boost your chances of landing a job and gaining financial stability, then a master’s degree program might better help you achieve your goals.

Learn more about data science 

Whatever your educational goals, data science requires extensive knowledge and training to enter the profession. To prepare for your next career move, then, you might consider taking a flexible online course through Coursera. 

The University of Colorado Boulder’s Data Science Foundations: Data Structures and Algorithms Specialization teaches course takers how to design algorithms, create applications, and organize, store, and process data efficiently. Their online Master of Science in Data Science , meanwhile, teaches broadly applicable foundational skills alongside specialized competencies tailored to specific career paths in just two years of instruction. 

Article sources

NYU Center for Data Science. “ PhD in Data Science, Curriculum , https://cds.nyu.edu/phd-curriculum-info/.” Accessed September 27, 2022. 

Education Data Initiative. “ Average Cost of a Doctorate Degree ,  https://educationdata.org/average-cost-of-a-doctorate-degree.” Accessed September 27, 2022. 

US BLS. “ Occupational Outlook Handbook: Data Scientists , https://www.bls.gov/ooh/math/data-scientists.htm#tab-1.” Accessed September 27, 2022. 

Keep reading

Coursera staff.

Editorial Team

Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.

Take $100 off your annual subscription

  • For a limited time, you can get a new Coursera Plus annual subscription for $100 off for your first year!
  • Get unlimited access to 7,000+ learning programs from world-class universities and companies like Google, Microsoft, and Yale.
  • Build the skills you need to succeed, anytime you need them—whether you're starting your first job, switching to a new career, or advancing in your current role.

Machine Learning - CMU

PhD Dissertations

PhD Dissertations

[all are .pdf files].

Advances in Statistical Gene Networks Jinjin Tian, 2023 Post-hoc calibration without distributional assumptions Chirag Gupta, 2023

The Role of Noise, Proxies, and Dynamics in Algorithmic Fairness (unavailable) Nil-Jana Akpinar, 2023

Collaborative learning by leveraging siloed data Sebastian Caldas, 2023

Modeling Epidemiological Time Series Aaron Rumack, 2023

Human-Centered Machine Learning: A Statistical and Algorithmic Perspective Leqi Liu, 2023

Uncertainty Quantification under Distribution Shifts Aleksandr Podkopaev, 2023

Probabilistic Reinforcement Learning: Using Data to Define Desired Outcomes, and Inferring How to Get There Benjamin Eysenbach, 2023

Comparing Forecasters and Abstaining Classifiers Yo Joong Choe, 2023

Using Task Driven Methods to Uncover Representations of Human Vision and Semantics Aria Yuan Wang, 2023

Data-driven Decisions - An Anomaly Detection Perspective Shubhranshu Shekhar, 2023

Applied Mathematics of the Future Kin G. Olivares, 2023

METHODS AND APPLICATIONS OF EXPLAINABLE MACHINE LEARNING Joon Sik Kim, 2023

NEURAL REASONING FOR QUESTION ANSWERING Haitian Sun, 2023

Principled Machine Learning for Societally Consequential Decision Making Amanda Coston, 2023

Long term brain dynamics extend cognitive neuroscience to timescales relevant for health and physiology Maxwell B. Wang

Long term brain dynamics extend cognitive neuroscience to timescales relevant for health and physiology Darby M. Losey, 2023

Calibrated Conditional Density Models and Predictive Inference via Local Diagnostics David Zhao, 2023

Towards an Application-based Pipeline for Explainability Gregory Plumb, 2022

Objective Criteria for Explainable Machine Learning Chih-Kuan Yeh, 2022

Making Scientific Peer Review Scientific Ivan Stelmakh, 2022

Facets of regularization in high-dimensional learning: Cross-validation, risk monotonization, and model complexity Pratik Patil, 2022

Active Robot Perception using Programmable Light Curtains Siddharth Ancha, 2022

Strategies for Black-Box and Multi-Objective Optimization Biswajit Paria, 2022

Unifying State and Policy-Level Explanations for Reinforcement Learning Nicholay Topin, 2022

Sensor Fusion Frameworks for Nowcasting Maria Jahja, 2022

Equilibrium Approaches to Modern Deep Learning Shaojie Bai, 2022

Towards General Natural Language Understanding with Probabilistic Worldbuilding Abulhair Saparov, 2022

Applications of Point Process Modeling to Spiking Neurons (Unavailable) Yu Chen, 2021

Neural variability: structure, sources, control, and data augmentation Akash Umakantha, 2021

Structure and time course of neural population activity during learning Jay Hennig, 2021

Cross-view Learning with Limited Supervision Yao-Hung Hubert Tsai, 2021

Meta Reinforcement Learning through Memory Emilio Parisotto, 2021

Learning Embodied Agents with Scalably-Supervised Reinforcement Learning Lisa Lee, 2021

Learning to Predict and Make Decisions under Distribution Shift Yifan Wu, 2021

Statistical Game Theory Arun Sai Suggala, 2021

Towards Knowledge-capable AI: Agents that See, Speak, Act and Know Kenneth Marino, 2021

Learning and Reasoning with Fast Semidefinite Programming and Mixing Methods Po-Wei Wang, 2021

Bridging Language in Machines with Language in the Brain Mariya Toneva, 2021

Curriculum Learning Otilia Stretcu, 2021

Principles of Learning in Multitask Settings: A Probabilistic Perspective Maruan Al-Shedivat, 2021

Towards Robust and Resilient Machine Learning Adarsh Prasad, 2021

Towards Training AI Agents with All Types of Experiences: A Unified ML Formalism Zhiting Hu, 2021

Building Intelligent Autonomous Navigation Agents Devendra Chaplot, 2021

Learning to See by Moving: Self-supervising 3D Scene Representations for Perception, Control, and Visual Reasoning Hsiao-Yu Fish Tung, 2021

Statistical Astrophysics: From Extrasolar Planets to the Large-scale Structure of the Universe Collin Politsch, 2020

Causal Inference with Complex Data Structures and Non-Standard Effects Kwhangho Kim, 2020

Networks, Point Processes, and Networks of Point Processes Neil Spencer, 2020

Dissecting neural variability using population recordings, network models, and neurofeedback (Unavailable) Ryan Williamson, 2020

Predicting Health and Safety: Essays in Machine Learning for Decision Support in the Public Sector Dylan Fitzpatrick, 2020

Towards a Unified Framework for Learning and Reasoning Han Zhao, 2020

Learning DAGs with Continuous Optimization Xun Zheng, 2020

Machine Learning and Multiagent Preferences Ritesh Noothigattu, 2020

Learning and Decision Making from Diverse Forms of Information Yichong Xu, 2020

Towards Data-Efficient Machine Learning Qizhe Xie, 2020

Change modeling for understanding our world and the counterfactual one(s) William Herlands, 2020

Machine Learning in High-Stakes Settings: Risks and Opportunities Maria De-Arteaga, 2020

Data Decomposition for Constrained Visual Learning Calvin Murdock, 2020

Structured Sparse Regression Methods for Learning from High-Dimensional Genomic Data Micol Marchetti-Bowick, 2020

Towards Efficient Automated Machine Learning Liam Li, 2020

LEARNING COLLECTIONS OF FUNCTIONS Emmanouil Antonios Platanios, 2020

Provable, structured, and efficient methods for robustness of deep networks to adversarial examples Eric Wong , 2020

Reconstructing and Mining Signals: Algorithms and Applications Hyun Ah Song, 2020

Probabilistic Single Cell Lineage Tracing Chieh Lin, 2020

Graphical network modeling of phase coupling in brain activity (unavailable) Josue Orellana, 2019

Strategic Exploration in Reinforcement Learning - New Algorithms and Learning Guarantees Christoph Dann, 2019 Learning Generative Models using Transformations Chun-Liang Li, 2019

Estimating Probability Distributions and their Properties Shashank Singh, 2019

Post-Inference Methods for Scalable Probabilistic Modeling and Sequential Decision Making Willie Neiswanger, 2019

Accelerating Text-as-Data Research in Computational Social Science Dallas Card, 2019

Multi-view Relationships for Analytics and Inference Eric Lei, 2019

Information flow in networks based on nonstationary multivariate neural recordings Natalie Klein, 2019

Competitive Analysis for Machine Learning & Data Science Michael Spece, 2019

The When, Where and Why of Human Memory Retrieval Qiong Zhang, 2019

Towards Effective and Efficient Learning at Scale Adams Wei Yu, 2019

Towards Literate Artificial Intelligence Mrinmaya Sachan, 2019

Learning Gene Networks Underlying Clinical Phenotypes Under SNP Perturbations From Genome-Wide Data Calvin McCarter, 2019

Unified Models for Dynamical Systems Carlton Downey, 2019

Anytime Prediction and Learning for the Balance between Computation and Accuracy Hanzhang Hu, 2019

Statistical and Computational Properties of Some "User-Friendly" Methods for High-Dimensional Estimation Alnur Ali, 2019

Nonparametric Methods with Total Variation Type Regularization Veeranjaneyulu Sadhanala, 2019

New Advances in Sparse Learning, Deep Networks, and Adversarial Learning: Theory and Applications Hongyang Zhang, 2019

Gradient Descent for Non-convex Problems in Modern Machine Learning Simon Shaolei Du, 2019

Selective Data Acquisition in Learning and Decision Making Problems Yining Wang, 2019

Anomaly Detection in Graphs and Time Series: Algorithms and Applications Bryan Hooi, 2019

Neural dynamics and interactions in the human ventral visual pathway Yuanning Li, 2018

Tuning Hyperparameters without Grad Students: Scaling up Bandit Optimisation Kirthevasan Kandasamy, 2018

Teaching Machines to Classify from Natural Language Interactions Shashank Srivastava, 2018

Statistical Inference for Geometric Data Jisu Kim, 2018

Representation Learning @ Scale Manzil Zaheer, 2018

Diversity-promoting and Large-scale Machine Learning for Healthcare Pengtao Xie, 2018

Distribution and Histogram (DIsH) Learning Junier Oliva, 2018

Stress Detection for Keystroke Dynamics Shing-Hon Lau, 2018

Sublinear-Time Learning and Inference for High-Dimensional Models Enxu Yan, 2018

Neural population activity in the visual cortex: Statistical methods and application Benjamin Cowley, 2018

Efficient Methods for Prediction and Control in Partially Observable Environments Ahmed Hefny, 2018

Learning with Staleness Wei Dai, 2018

Statistical Approach for Functionally Validating Transcription Factor Bindings Using Population SNP and Gene Expression Data Jing Xiang, 2017

New Paradigms and Optimality Guarantees in Statistical Learning and Estimation Yu-Xiang Wang, 2017

Dynamic Question Ordering: Obtaining Useful Information While Reducing User Burden Kirstin Early, 2017

New Optimization Methods for Modern Machine Learning Sashank J. Reddi, 2017

Active Search with Complex Actions and Rewards Yifei Ma, 2017

Why Machine Learning Works George D. Montañez , 2017

Source-Space Analyses in MEG/EEG and Applications to Explore Spatio-temporal Neural Dynamics in Human Vision Ying Yang , 2017

Computational Tools for Identification and Analysis of Neuronal Population Activity Pengcheng Zhou, 2016

Expressive Collaborative Music Performance via Machine Learning Gus (Guangyu) Xia, 2016

Supervision Beyond Manual Annotations for Learning Visual Representations Carl Doersch, 2016

Exploring Weakly Labeled Data Across the Noise-Bias Spectrum Robert W. H. Fisher, 2016

Optimizing Optimization: Scalable Convex Programming with Proximal Operators Matt Wytock, 2016

Combining Neural Population Recordings: Theory and Application William Bishop, 2015

Discovering Compact and Informative Structures through Data Partitioning Madalina Fiterau-Brostean, 2015

Machine Learning in Space and Time Seth R. Flaxman, 2015

The Time and Location of Natural Reading Processes in the Brain Leila Wehbe, 2015

Shape-Constrained Estimation in High Dimensions Min Xu, 2015

Spectral Probabilistic Modeling and Applications to Natural Language Processing Ankur Parikh, 2015 Computational and Statistical Advances in Testing and Learning Aaditya Kumar Ramdas, 2015

Corpora and Cognition: The Semantic Composition of Adjectives and Nouns in the Human Brain Alona Fyshe, 2015

Learning Statistical Features of Scene Images Wooyoung Lee, 2014

Towards Scalable Analysis of Images and Videos Bin Zhao, 2014

Statistical Text Analysis for Social Science Brendan T. O'Connor, 2014

Modeling Large Social Networks in Context Qirong Ho, 2014

Semi-Cooperative Learning in Smart Grid Agents Prashant P. Reddy, 2013

On Learning from Collective Data Liang Xiong, 2013

Exploiting Non-sequence Data in Dynamic Model Learning Tzu-Kuo Huang, 2013

Mathematical Theories of Interaction with Oracles Liu Yang, 2013

Short-Sighted Probabilistic Planning Felipe W. Trevizan, 2013

Statistical Models and Algorithms for Studying Hand and Finger Kinematics and their Neural Mechanisms Lucia Castellanos, 2013

Approximation Algorithms and New Models for Clustering and Learning Pranjal Awasthi, 2013

Uncovering Structure in High-Dimensions: Networks and Multi-task Learning Problems Mladen Kolar, 2013

Learning with Sparsity: Structures, Optimization and Applications Xi Chen, 2013

GraphLab: A Distributed Abstraction for Large Scale Machine Learning Yucheng Low, 2013

Graph Structured Normal Means Inference James Sharpnack, 2013 (Joint Statistics & ML PhD)

Probabilistic Models for Collecting, Analyzing, and Modeling Expression Data Hai-Son Phuoc Le, 2013

Learning Large-Scale Conditional Random Fields Joseph K. Bradley, 2013

New Statistical Applications for Differential Privacy Rob Hall, 2013 (Joint Statistics & ML PhD)

Parallel and Distributed Systems for Probabilistic Reasoning Joseph Gonzalez, 2012

Spectral Approaches to Learning Predictive Representations Byron Boots, 2012

Attribute Learning using Joint Human and Machine Computation Edith L. M. Law, 2012

Statistical Methods for Studying Genetic Variation in Populations Suyash Shringarpure, 2012

Data Mining Meets HCI: Making Sense of Large Graphs Duen Horng (Polo) Chau, 2012

Learning with Limited Supervision by Input and Output Coding Yi Zhang, 2012

Target Sequence Clustering Benjamin Shih, 2011

Nonparametric Learning in High Dimensions Han Liu, 2010 (Joint Statistics & ML PhD)

Structural Analysis of Large Networks: Observations and Applications Mary McGlohon, 2010

Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy Brian D. Ziebart, 2010

Tractable Algorithms for Proximity Search on Large Graphs Purnamrita Sarkar, 2010

Rare Category Analysis Jingrui He, 2010

Coupled Semi-Supervised Learning Andrew Carlson, 2010

Fast Algorithms for Querying and Mining Large Graphs Hanghang Tong, 2009

Efficient Matrix Models for Relational Learning Ajit Paul Singh, 2009

Exploiting Domain and Task Regularities for Robust Named Entity Recognition Andrew O. Arnold, 2009

Theoretical Foundations of Active Learning Steve Hanneke, 2009

Generalized Learning Factors Analysis: Improving Cognitive Models with Machine Learning Hao Cen, 2009

Detecting Patterns of Anomalies Kaustav Das, 2009

Dynamics of Large Networks Jurij Leskovec, 2008

Computational Methods for Analyzing and Modeling Gene Regulation Dynamics Jason Ernst, 2008

Stacked Graphical Learning Zhenzhen Kou, 2007

Actively Learning Specific Function Properties with Applications to Statistical Inference Brent Bryan, 2007

Approximate Inference, Structure Learning and Feature Estimation in Markov Random Fields Pradeep Ravikumar, 2007

Scalable Graphical Models for Social Networks Anna Goldenberg, 2007

Measure Concentration of Strongly Mixing Processes with Applications Leonid Kontorovich, 2007

Tools for Graph Mining Deepayan Chakrabarti, 2005

Automatic Discovery of Latent Variable Models Ricardo Silva, 2005

phd thesis data science

phd thesis data science

PhD in Data Science

Conduct research on cutting edge problems and explore the emerging field of Data Science alongside preeminent faculty at UChicago. As an emerging discipline, Data Science addresses foundational problems across the entire data life cycle. Tackling issues of inequity, climate change, and sustainability will require cutting edge research in artificial intelligence and data usage combined with innovative educational programs to train students in the concepts of information systems. Students of Data Science will not only immerse themselves in a rapidly evolving field; they will help redefine it altogether.

Research Excellence:

You will learn from faculty who have developed research programs that span a wide variety of data science and AI topics, from theory to applications, with a focus on making a societal impact.

Research Topics:

  • Artificial Intelligence
  • Data, AI, and Society
  • Data Systems
  • Human-Centered Data Science
  • Machine Learning and Statistics
  • Use-Inspired Data Science

For more information, including a link to the application, see the Committee on Data Science website .

DiscoverDataScience.org

PhD in Data Science – Your Guide to Choosing a Doctorate Degree Program

phd thesis data science

Reviewed by Jack Levinson

Professional opportunities in data science are growing incredibly fast. That’s great news for students looking to pursue a career as a data scientist. But it also means that there are a lot more options out there to investigate and understand before developing the best educational path for you.

A PhD is the most advanced data science degree you can get, reflecting a depth of knowledge and technical expertise that will put you at the top of your field.

phd data science

This means that PhD programs are the most time-intensive degree option out there, typically requiring that students complete dissertations involving rigorous research. This means that PhDs are not for everyone. Indeed, many who work in the world of big data hold master’s degrees rather than PhDs, which tend to involve the same coursework as PhD programs without a dissertation component. However, for the right candidate, a PhD program is the perfect choice to become a true expert on your area of focus.

If you’ve concluded that a data science PhD is the right path for you, this guide is intended to help you choose the best program to suit your needs. It will walk through some of the key considerations while picking graduate data science programs and some of the nuts and bolts (like course load and tuition costs) that are part of the data science PhD decision-making process.

Data Science PhD vs. Masters: Choosing the right option for you

If you’re considering pursuing a data science PhD, it’s worth knowing that such an advanced degree isn’t strictly necessary in order to get good work opportunities. Many who work in the field of big data only hold master’s degrees, which is the level of education expected to be a competitive candidate for data science positions.

So why pursue a data science PhD?

Simply put, a PhD in data science will leave you qualified to enter the big data industry at a high level from the outset.

You’ll be eligible for advanced positions within companies, holding greater responsibilities, keeping more direct communication with leadership, and having more influence on important data-driven decisions. You’re also likely to receive greater compensation to match your rank.

However, PhDs are not for everyone. Dissertations require a great deal of time and an interest in intensive research. If you are eager to jumpstart a career quickly, a master’s program will give you the preparation you need to hit the ground running. PhDs are appropriate for those who want to commit their time and effort to schooling as a long-term investment in their professional trajectory.

For more information on the difference between data science PhD’s and master’s programs, take a look at our guide here.

Topics include:

  • Can I get an Online Ph.D in Data Science?
  • Overview of Ph.d Coursework

Preparing for a Doctorate Program

Building a solid track record of professional experience, things to consider when choosing a school.

  • What Does it Cost to Get a Ph.D in Data Science?
  • School Listings

data analysis graph

Data Science PhD Programs, Historically

Historically, data science PhD programs were one of the main avenues to get a good data-related position in academia or industry. But, PhD programs are heavily research oriented and require a somewhat long term investment of time, money, and energy to obtain. The issue that some data science PhD holders are reporting, especially in industry settings, is that that the state of the art is moving so quickly, and that the data science industry is evolving so rapidly, that an abundance of research oriented expertise is not always what’s heavily sought after.

Instead, many companies are looking for candidates who are up to date with the latest data science techniques and technologies, and are willing to pivot to match emerging trends and practices.

One recent development that is making the data science graduate school decisions more complex is the introduction of specialty master’s degrees, that focus on rigorous but compact, professional training. Both students and companies are realizing the value of an intensive, more industry-focused degree that can provide sufficient enough training to manage complex projects and that are more client oriented, opposed to research oriented.

However, not all prospective data science PhD students are looking for jobs in industry. There are some pretty amazing research opportunities opening up across a variety of academic fields that are making use of new data collection and analysis tools. Experts that understand how to leverage data systems including statistics and computer science to analyze trends and build models will be in high demand.

Can You Get a PhD in Data Science Online?

While it is not common to get a data science Ph.D. online, there are currently two options for those looking to take advantage of the flexibility of an online program.

Indiana University Bloomington and Northcentral University both offer online Ph.D. programs with either a minor or specialization in data science.

Given the trend for schools to continue increasing online offerings, expect to see additional schools adding this option in the near future.

woman data analysis on computer screens

Overview of PhD Coursework

A PhD requires a lot of academic work, which generally requires between four and five years (sometimes longer) to complete.

Here are some of the high level factors to consider and evaluate when comparing data science graduate programs.

How many credits are required for a PhD in data science?

On average, it takes 71 credits to graduate with a PhD in data science — far longer (almost double) than traditional master’s degree programs. In addition to coursework, most PhD students also have research and teaching responsibilities that can be simultaneously demanding and really great career preparation.

What’s the core curriculum like?

In a data science doctoral program, you’ll be expected to learn many skills and also how to apply them across domains and disciplines. Core curriculums will vary from program to program, but almost all will have a core foundation of statistics.

All PhD candidates will have to take a qualifying exam. This can vary from university to university, but to give you some insight, it is broken up into three phases at Yale. They have a practical exam, a theory exam and an oral exam. The goal is to make sure doctoral students are developing the appropriate level of expertise.

Dissertation

One of the final steps of a PhD program involves presenting original research findings in a formal document called a dissertation. These will provide background and context, as well as findings and analysis, and can contribute to the understanding and evolution of data science. A dissertation idea most often provides the framework for how a PhD candidate’s graduate school experience will unfold, so it’s important to be thoughtful and deliberate while considering research opportunities.

Since data science is such a rapidly evolving field and because choosing the right PhD program is such an important factor in developing a successful career path, there are some steps that prospective doctoral students can take in advance to find the best-fitting opportunity.

Join professional associations

Even before being fully credentials, joining professional associations and organizations such as the Data Science Association and the American Association of Big Data Professionals is a good way to get exposure to the field. Many professional societies are welcoming to new members and even encourage student participation with things like discounted membership fees and awards and contest categories for student researchers. One of the biggest advantages to joining is that these professional associations bring together other data scientists for conference events, research-sharing opportunities, networking and continuing education opportunities.

Leverage your social network

Be on the lookout to make professional connections with professors, peers, and members of industry. There are a number of LinkedIn groups dedicated to data science. A well-maintained professional network is always useful to have when looking for advice or letters of recommendation while applying to graduate school and then later while applying for jobs and other career-related opportunities.

Kaggle competitions

Kaggle competitions provide the opportunity to solve real-world data science problems and win prizes. A list of data science problems can be found at Kaggle.com . Winning one of these competitions is a good way to demonstrate professional interest and experience.

Internships

Internships are a great way to get real-world experience in data science while also getting to work for top names in the world of business. For example, IBM offers a data science internship which would also help to stand out when applying for PhD programs, as well as in seeking employment in the future.

Demonstrating professional experience is not only important when looking for jobs, but it can also help while applying for graduate school. There are a number of ways for prospective students to gain exposure to the field and explore different facets of data science careers.

Get certified

There are a number of data-related certificate programs that are open to people with a variety of academic and professional experience. DeZyre has an excellent guide to different certifications, some of which might help provide good background for graduate school applications.

Conferences

Conferences are a great place to meet people presenting new and exciting research in the data science field and bounce ideas off of newfound connections. Like professional societies and organizations, discounted student rates are available to encourage student participation. In addition, some conferences will waive fees if you are presenting a poster or research at the conference, which is an extra incentive to present.

teacher in full classroom of students

It can be hard to quantify what makes a good-fit when it comes to data science graduate school programs. There are easy to evaluate factors, such as cost and location, and then there are harder to evaluate criteria such as networking opportunities, accessibility to professors, and the up-to-dateness of the program’s curriculum.

Nevertheless, there are some key relevant considerations when applying to almost any data science graduate program.

What most schools will require when applying:

  • All undergraduate and graduate transcripts
  • A statement of intent for the program (reason for applying and future plans)
  • Letters of reference
  • Application fee
  • Online application
  • A curriculum vitae (outlining all of your academic and professional accomplishments)

What Does it Cost to Get a PhD in Data Science?

The great news is that many PhD data science programs are supported by fellowships and stipends. Some are completely funded, meaning the school will pay tuition and basic living expenses. Here are several examples of fully funded programs:

  • University of Southern California
  • University of Nevada, Reno
  • Kennesaw State University
  • Worcester Polytechnic Institute
  • University of Maryland

For all other programs, the average range of tuition, depending on the school can range anywhere from $1,300 per credit hour to $2,000 amount per credit hour. Remember, typical PhD programs in data science are between 60 and 75 credit hours, meaning you could spend up to $150,000 over several years.

That’s why the financial aspects are so important to evaluate when assessing PhD programs, because some schools offer full stipends so that you are able to attend without having to find supplemental scholarships or tuition assistance.

Can I become a professor of data science with a PhD.? Yes! If you are interested in teaching at the college or graduate level, a PhD is the degree needed to establish the full expertise expected to be a professor. Some data scientists who hold PhDs start by entering the field of big data and pivot over to teaching after gaining a significant amount of work experience. If you’re driven to teach others or to pursue advanced research in data science, a PhD is the right degree for you.

Do I need a master’s in order to pursue a PhD.? No. Many who pursue PhDs in Data Science do not already hold advanced degrees, and many PhD programs include all the coursework of a master’s program in the first two years of school. For many students, this is the most time-effective option, allowing you to complete your education in a single pass rather than interrupting your studies after your master’s program.

Can I choose to pursue a PhD after already receiving my master’s? Yes. A master’s program can be an opportunity to get the lay of the land and determine the specific career path you’d like to forge in the world of big data. Some schools may allow you to simply extend your academic timeline after receiving your master’s degree, and it is also possible to return to school to receive a PhD if you have been working in the field for some time.

If a PhD. isn’t necessary, is it a waste of time? While not all students are candidates for PhDs, for the right students – who are keen on doing in-depth research, have the time to devote to many years of school, and potentially have an interest in continuing to work in academia – a PhD is a great choice. For more information on this question, take a look at our article Is a Data Science PhD. Worth It?

Complete List of Data Science PhD Programs

Below you will find the most comprehensive list of schools offering a doctorate in data science. Each school listing contains a link to the program specific page, GRE or a master’s degree requirements, and a link to a page with detailed course information.

Note that the listing only contains true data science programs. Other similar programs are often lumped together on other sites, but we have chosen to list programs such as data analytics and business intelligence on a separate section of the website.

Boise State University  – Boise, Idaho PhD in Computing – Data Science Concentration

The Data Science emphasis focuses on the development of mathematical and statistical algorithms, software, and computing systems to extract knowledge or insights from data.  

In 60 credits, students complete an Introduction to Graduate Studies, 12 credits of core courses, 6 credits of data science elective courses, 10 credits of other elective courses, a Doctoral Comprehensive Examination worth 1 credit, and a 30-credit dissertation.

Electives can be taken in focus areas such as Anthropology, Biometry, Ecology/Evolution and Behavior, Econometrics, Electrical Engineering, Earth Dynamics and Informatics, Geoscience, Geostatistics, Hydrology and Hydrogeology, Materials Science, and Transportation Science.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $7,236 total (Resident), $24,573 total (Non-resident)

View Course Offerings

Bowling Green State University  – Bowling Green, Ohio Ph.D. in Data Science

Data Science students at Bowling Green intertwine knowledge of computer science with statistics.

Students learn techniques in analyzing structured, unstructured, and dynamic datasets.

Courses train students to understand the principles of analytic methods and articulating the strengths and limitations of analytical methods.

The program requires 60 credit hours in the studies of Computer Science (6 credit hours), Statistics (6 credit hours), Data Science Exploration and Communication, Ethical Issues, Advanced Data Mining, and Applied Data Science Experience.

Students must also complete 21 credit hours of elective courses, a qualifying exam, a preliminary exam, and a dissertation.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $8,418 (Resident), $14,410 (Non-resident)

Brown University  – Providence, Rhode Island PhD in Computer Science – Concentration in Data Science

Brown University’s database group is a world leader in systems-oriented database research; they seek PhD candidates with strong system-building skills who are interested in researching TupleWare, MLbase, MDCC, Crowd DB, or PIQL.

In order to gain entrance, applicants should consider first doing a research internship at Brown with this group. Other ways to boost an application are to take and do well at massive open online courses, do an internship at a large company, and get involved in a large open-source software project.

Coding well in C++ is preferred.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $62,680 total

Chapman University  – Irvine, California Doctorate in Computational and Data Sciences

Candidates for the doctorate in computational and data science at Chapman University begin by completing 13 core credits in basic methodologies and techniques of computational science.

Students complete 45 credits of electives, which are personalized to match the specific interests and research topics of the student.

Finally, students complete up to 12 credits in dissertation research.

Applicants must have completed courses in differential equations, data structures, and probability and statistics, or take specific foundation courses, before beginning coursework toward the PhD.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $37,538 per year

Clemson University / Medical University of South Carolina (MUSC) – Joint Program – Clemson, South Carolina & Charleston, South Carolina Doctor of Philosophy in Biomedical Data Science and Informatics – Clemson

The PhD in biomedical data science and informatics is a joint program co-authored by Clemson University and the Medical University of South Carolina (MUSC).

Students choose one of three tracks to pursue: precision medicine, population health, and clinical and translational informatics. Students complete 65-68 credit hours, and take courses in each of 5 areas: biomedical informatics foundations and applications; computing/math/statistics/engineering; population health, health systems, and policy; biomedical/medical domain; and lab rotations, seminars, and doctoral research.

Applicants must have a bachelor’s in health science, computing, mathematics, statistics, engineering, or a related field, and it is recommended to also have competency in a second of these areas.

Program requirements include a year of calculus and college biology, as well as experience in computer programming.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $10,858 total (South Carolina Resident), $22,566 total (Non-resident)

View Course Offerings – Clemson

George Mason University  – Fairfax, Virginia Doctor of Philosophy in Computational Sciences and Informatics – Emphasis in Data Science

George Mason’s PhD in computational sciences and informatics requires a minimum of 72 credit hours, though this can be reduced if a student has already completed a master’s. 48 credits are toward graduate coursework, and an additional 24 are for dissertation research.

Students choose an area of emphasis—either computer modeling and simulation or data science—and completed 18 credits of the coursework in this area. Students are expected to completed the coursework in 4-5 years.

Applicants to this program must have a bachelor’s degree in a natural science, mathematics, engineering, or computer science, and must have knowledge and experience with differential equations and computer programming.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $13,426 total (Virginia Resident), $35,377 total (Non-resident)

Harrisburg University of Science and Technology  – Harrisburg, Pennsylvania Doctor of Philosophy in Data Sciences

Harrisburg University’s PhD in data science is a 4-5 year program, the first 2 of which make up the Harrisburg master’s in analytics.

Beyond this, PhD candidates complete six milestones to obtain the degree, including 18 semester hours in doctoral-level courses, such as multivariate data analysis, graph theory, machine learning.

Following the completion of ANLY 760 Doctoral Research Seminar, students in the program complete their 12 hours of dissertation research bringing the total program hours to 36.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $14,940 total

Icahn School of Medicine at Mount Sinai  – New York, New York Genetics and Data Science, PhD

As part of the Biomedical Science PhD program, the Genetics and Data Science multidisciplinary training offers research opportunities that expand on genetic research and modern genomics. The training also integrates several disciplines of biomedical sciences with machine learning, network modeling, and big data analysis.

Students in the Genetics and Data Science program complete a predetermined course schedule with a total of 64 credits and 3 years of study.

Additional course requirements and electives include laboratory rotations, a thesis proposal exam and thesis defense, Computer Systems, Intro to Algorithms, Machine Learning for Biomedical Data Science, Translational Genomics, and Practical Analysis of a Personal Genome.

Delivery Method: Campus GRE: Not Required 2022-2023 Tuition: $31,303 total

Indiana University-Purdue University Indianapolis  – Indianapolis, Indiana PhD in Data Science PhD Minor in Applied Data Science

Doctoral candidates pursuing the PhD in data science at Indiana University-Purdue must display competency in research, data analytics, and at management and infrastructure to earn the degree.

The PhD is comprised of 24 credits of a data science core, 18 credits of methods courses, 18 credits of a specialization, written and oral qualifying exams, and 30 credits of dissertation research. All requirements must be completed within 7 years.

Applicants are generally expected to have a master’s in social science, health, data science, or computer science. 

Currently a majority of the PhD students at IUPUI are funded by faculty grants and two are funded by the federal government. None of the students are self funded.

IUPUI also offers a PhD Minor in Applied Data Science that is 12-18 credits. The minor is open to students enrolled at IUPUI or IU Bloomington in a doctoral program other than Data Science.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $9,228 per year (Indiana Resident), $25,368 per year (Non-resident)

Jackson State University – Jackson, Mississippi PhD Computational and Data-Enabled Science and Engineering

Jackson State University offers a PhD in computational and data-enabled science and engineering with 5 concentration areas: computational biology and bioinformatics, computational science and engineering, computational physical science, computation public health, and computational mathematics and social science.

Students complete 12 credits of common core courses, 12 credits in the specialization, 24 credits of electives, and 24 credits in dissertation research.

Students may complete the doctoral program in as little as 5 years and no more than 8 years.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $8,270 total

Kennesaw State University  – Kennesaw, Georgia PhD in Analytics and Data Science

Students pursuing a PhD in analytics and data science at Kennesaw State University must complete 78 credit hours: 48 course hours and 6 electives (spread over 4 years of study), a minimum 12 credit hours for dissertation research, and a minimum 12 credit-hour internship.

Prior to dissertation research, the comprehensive examination will cover material from the three areas of study: computer science, mathematics, and statistics.

Successful applicants will have a master’s degree in a computational field, calculus I and II, programming experience, modeling experience, and are encouraged to have a base SAS certification.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $5,328 total (Georgia Resident), $19,188 total (Non-resident)

New Jersey Institute of Technology  – Newark, New Jersey PhD in Business Data Science

Students may enter the PhD program in business data science at the New Jersey Institute of Technology with either a relevant bachelor’s or master’s degree. Students with bachelor’s degrees begin with 36 credits of advanced courses, and those with master’s take 18 credits before moving on to credits in dissertation research.

Core courses include business research methods, data mining and analysis, data management system design, statistical computing with SAS and R, and regression analysis.

Students take qualifying examinations at the end of years 1 and 2, and must defend their dissertations successfully by the end of year 6.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $21,932 total (New Jersey Resident), $32,426 total (Non-resident)

New York University  – New York, New York PhD in Data Science

Doctoral candidates in data science at New York University must complete 72 credit hours, pass a comprehensive and qualifying exam, and defend a dissertation with 10 years of entering the program.

Required courses include an introduction to data science, probability and statistics for data science, machine learning and computational statistics, big data, and inference and representation.

Applicants must have an undergraduate or master’s degree in fields such as mathematics, statistics, computer science, engineering, or other scientific disciplines. Experience with calculus, probability, statistics, and computer programming is also required.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $37,332 per year

View Course Offering

Northcentral University  – San Diego, California PhD in Data Science-TIM

Northcentral University offers a PhD in technology and innovation management with a specialization in data science.

The program requires 60 credit hours, including 6-7 core courses, 3 in research, a PhD portfolio, and 4 dissertation courses.

The data science specialization requires 6 courses: data mining, knowledge management, quantitative methods for data analytics and business intelligence, data visualization, predicting the future, and big data integration.

Applicants must have a master’s already.

Delivery Method: Online GRE: Required 2022-2023 Tuition: $16,794 total

Stevens Institute of Technology – Hoboken, New Jersey Ph.D. in Data Science

Stevens Institute of Technology has developed a data science Ph.D. program geared to help graduates become innovators in the space.

The rigorous curriculum emphasizes mathematical and statistical modeling, machine learning, computational systems and data management.

The program is directed by Dr. Ted Stohr, a recognized thought leader in the information systems, operations and business process management arenas.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $39,408 per year

University at Buffalo – Buffalo, New York PhD Computational and Data-Enabled Science and Engineering

The curriculum for the University of Buffalo’s PhD in computational and data-enabled science and engineering centers around three areas: data science, applied mathematics and numerical methods, and high performance and data intensive computing. 9 credit course of courses must be completed in each of these three areas. Altogether, the program consists of 72 credit hours, and should be completed in 4-5 years. A master’s degree is required for admission; courses taken during the master’s may be able to count toward some of the core coursework requirements.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $11,310 per year (New York Resident), $23,100 per year (Non-resident)

University of Colorado Denver – Denver, Colorado PhD in Big Data Science and Engineering

The University of Colorado – Denver offers a unique program for those students who have already received admission to the computer science and information systems PhD program.

The Big Data Science and Engineering (BDSE) program is a PhD fellowship program that allows selected students to pursue research in the area of big data science and engineering. This new fellowship program was created to train more computer scientists in data science application fields such as health informatics, geosciences, precision and personalized medicine, business analytics, and smart cities and cybersecurity.

Students in the doctoral program must complete 30 credit hours of computer science classes beyond a master’s level, and 30 credit hours of dissertation research.

The BDSE fellowship requires students to have an advisor both in the core disciplines (either computer science or mathematics and statistics) as well as an advisor in the application discipline (medicine and public health, business, or geosciences).

In addition, the fellowship covers full stipend, tuition, and fees up to ~50k for BDSE fellows annually. Important eligibility requirements can be found here.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $55,260 total

University of Marylan d  – College Park, Maryland PhD in Information Studies

Data science is a potential research area for doctoral candidates in information studies at the University of Maryland – College Park. This includes big data, data analytics, and data mining.

Applicants for the PhD must have taken the following courses in undergraduate studies: programming languages, data structures, design and analysis of computer algorithms, calculus I and II, and linear algebra.

Students must complete 6 qualifying courses, 2 elective graduate courses, and at least 12 credit hours of dissertation research.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $16,238 total (Maryland Resident), $35,388 total (Non-resident)

University of Massachusetts Boston  – Boston, Massachusetts PhD in Business Administration – Information Systems for Data Science Track

The University of Massachusetts – Boston offers a PhD in information systems for data science. As this is a business degree, students must complete coursework in their first two years with a focus on data for business; for example, taking courses such as business in context: markets, technologies, and societies.

Students must take and pass qualifying exams at the end of year 1, comprehensive exams at the end of year 2, and defend their theses at the end of year 4.

Those with a degree in statistics, economics, math, computer science, management sciences, information systems, and other related fields are especially encouraged, though a quantitative degree is not necessary.

Students accepted by the program are ordinarily offered full tuition credits and a stipend ($25,000 per year) to cover educational expenses and help defray living costs for up to three years of study.

During the first two years of coursework, they are assigned to a faculty member as a research assistant; for the third year students will be engaged in instructional activities. Funding for the fourth year is merit-based from a limited pool of program funds

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $18,894 total (in-state), $36,879 (out-of-state)

University of Nevada Reno – Reno, Nevada PhD in Statistics and Data Science

The University of Nevada – Reno’s doctoral program in statistics and data science is comprised of 72 credit hours to be completed over the course of 4-5 years. Coursework is all within the scope of statistics, with titles such as statistical theory, probability theory, linear models, multivariate analysis, statistical learning, statistical computing, time series analysis.

The completion of a Master’s degree in mathematics or statistics prior to enrollment in the doctoral program is strongly recommended, but not required.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $5,814 total (in-state), $22,356 (out-of-state)

University of Southern California – Los Angles, California PhD in Data Sciences & Operations

USC Marshall School of Business offers a PhD in data sciences and operations to be completed in 5 years.

Students can choose either a track in operations management or in statistics. Both tracks require 4 courses in fall and spring of the first 2 years, as well as a research paper and courses during the summers. Year 3 is devoted to dissertation preparation and year 4 and/or 5 to dissertation defense.

A bachelor’s degree is necessary for application, but no field or further experience is required.

Students should complete 60 units of coursework. If the students are admitted with Advanced Standing (e.g., Master’s Degree in appropriate field), this requirement may be reduced to 40 credits.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $63,468 total

University of Tennessee-Knoxville  – Knoxville, Tennessee The Data Science and Engineering PhD

The data science and engineering PhD at the University of Tennessee – Knoxville requires 36 hours of coursework and 36 hours of dissertation research. For those entering with an MS degree, only 24 hours of course work is required.

The core curriculum includes work in statistics, machine learning, and scripting languages and is enhanced by 6 hours in courses that focus either on policy issues related to data, or technology entrepreneurship.

Students must also choose a knowledge specialization in one of these fields: health and biological sciences, advanced manufacturing, materials science, environmental and climate science, transportation science, national security, urban systems science, and advanced data science.

Applicants must have a bachelor’s or master’s degree in engineering or a scientific field. 

All students that are admitted will be supported by a research fellowship and tuition will be included.

Many students will perform research with scientists from Oak Ridge national lab, which is located about 30 minutes drive from campus.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $11,468 total (Tennessee Resident), $29,656 total (Non-resident)

University of Vermont – Burlington, Vermont Complex Systems and Data Science (CSDS), PhD

Through the College of Engineering and Mathematical Sciences, the Complex Systems and Data Science (CSDS) PhD program is pan-disciplinary and provides computational and theoretical training. Students may customize the program depending on their chosen area of focus.

Students in this program work in research groups across campus.

Core courses include Data Science, Principles of Complex Systems and Modeling Complex Systems. Elective courses include Machine Learning, Complex Networks, Evolutionary Computation, Human/Computer Interaction, and Data Mining.

The program requires at least 75 credits to graduate with approval by the student graduate studies committee.

Delivery Method: Campus GRE: Not Required 2022-2023 Tuition: $12,204 total (Vermont Resident), $30,960 total (Non-resident)

University of Washington Seattle Campus – Seattle, Washington PhD in Big Data and Data Science

The University of Washington’s PhD program in data science has 2 key goals: training of new data scientists and cyberinfrastructure development, i.e., development of open-source tools and services that scientists around the world can use for big data analysis.

Students must take core courses in data management, machine learning, data visualization, and statistics.

Students are also required to complete at least one internship that covers practical work in big data.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $17,004 per year (Washington resident), $30,477 (non-resident)

University of Wisconsin-Madison – Madison, Wisconsin PhD in Biomedical Data Science

The PhD program in Biomedical Data Science offered by the Department of Biostatistics and Medical Informatics at UW-Madison is unique, in blending the best of statistics and computer science, biostatistics and biomedical informatics. 

Students complete three year-long course sequences in biostatistics theory and methods, computer science/informatics, and a specialized sequence to fit their interests.

Students also complete three research rotations within their first two years in the program, to both expand their breadth of knowledge and assist in identifying a research advisor.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $10,728 total (in-state), $24,054 total (out-of-state)

Vanderbilt University – Nashville, Tennessee Data Science Track of the BMI PhD Program

The PhD in biomedical informatics at Vanderbilt has the option of a data science track.

Students complete courses in the areas of biomedical informatics (3 courses), computer science (4 courses), statistical methods (4 courses), and biomedical science (2 courses). Students are expected to complete core courses and defend their dissertations within 5 years of beginning the program.

Applicants must have a bachelor’s degree in computer science, engineering, biology, biochemistry, nursing, mathematics, statistics, physics, information management, or some other health-related field.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $53,160 per year

Washington University in St. Louis – St. Louis, Missouri Doctorate in Computational & Data Sciences

Washington University now offers an interdisciplinary Ph.D. in Computational & Data Sciences where students can choose from one of four tracks (Computational Methodologies, Political Science, Psychological & Brain Sciences, or Social Work & Public Health).

Students are fully funded and will receive a stipend for at least five years contingent on making sufficient progress in the program.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $59,420 total

Worcester Polytechnic Institute – Worcester, Massachusetts PhD in Data Science

The PhD in data science at Worcester Polytechnic Institute focuses on 5 areas: integrative data science, business intelligence and case studies, data access and management, data analytics and mining, and mathematical analysis.

Students first complete a master’s in data science, and then complete 60 credit hours beyond the master’s, including 30 credit hours of research.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $28,980 per year

Yale University – New Haven, Connecticut PhD Program – Department of Stats and Data Science

The PhD in statistics and data science at Yale University offers broad training in the areas of statistical theory, probability theory, stochastic processes, asymptotics, information theory, machine learning, data analysis, statistical computing, and graphical methods. Students complete 12 courses in the first year in these topics.

Students are required to teach one course each semester of their third and fourth years.

Most students complete and defend their dissertations in their fifth year.

Applicants should have an educational background in statistics, with an undergraduate major in statistics, mathematics, computer science, or similar field.

Delivery Method: Campus GRE: Required 2022-2023 Tuition: $46,900 total

phd thesis data science

  • Related Programs

wiley university servieces logo

cds official logo

NYU Center for Data Science

Harnessing Data’s Potential for the World

PhD in Data Science

An NRT-sponsored program in Data Science

  • Areas & Faculty
  • Admission Requirements
  • Medical School Track
  • NRT FUTURE Program

Advances in computational speed and data availability, and the development of novel data analysis methods, have birthed a new field: data science. This new field requires a new type of researcher and actor: the rigorously trained, cross-disciplinary, and ethically responsible data scientist. Launched in Fall 2017, the pioneering CDS PhD Data Science program seeks to produce such researchers who are fluent in the emerging field of data science, and to develop a native environment for their education and training. The CDS PhD Data Science program has rapidly received widespread recognition and is considered among the top and most selective data science doctoral programs in the world. It has recently been recognized by the NSF through an NRT training grant.

The CDS PhD program model rigorously trains data scientists of the future who (1) develop methodology and harness statistical tools to find answers to questions that transcend the boundaries of traditional academic disciplines; (2) clearly communicate to extract crisp questions from big, heterogeneous, uncertain data; (3) effectively translate fundamental research insights into data science practice in the sciences, medicine, industry, and government; and (4) are aware of the ethical implications of their work.

Our programmatic mission is to nurture this new generation of data scientists, by designing and building a data science environment where methodological innovations are developed and translated successfully to domain applications, both scientific and social. Our vision is that combining fundamental research on the principles of data science with translational projects involving domain experts creates a virtuous cycle: Advances in data science methodology transform the process of discovery in the sciences, and enable effective data-driven governance in the public sector. At the same time, the demands of real-world translational projects will catalyze the creation of new data science methodologies. An essential ingredient of such methodologies is that they embed ethics and responsibility by design.

These objectives will be achieved by a combination of an innovative core curriculum, a novel data assistantship mechanism that provides training of skills transfer through rotations and internships, and communication and entrepreneurship modules. Students will be exposed to a wider range of fields than in more standard PhD programs while working with our interdisciplinary faculty. In particular, we are proud to offer a medical track for students eager to explore data science as applied to healthcare or to develop novel theoretical models stemming from medical questions.

In short, the CDS PhD Data Science program prepares students to become leaders in data science research and prepares them for outstanding careers in academia or industry. Successful candidates are guaranteed financial support in the form of tuition and a competitive stipend in the fall and spring semesters for up to five years.* We invite you to learn more through our webpage or by contacting  [email protected] .

*The Ph.D. program also offers students the opportunity to pursue their study and research with Data Science faculty based at NYU Shanghai. With this opportunity, students generally complete their coursework in New York City before moving full-time to Shanghai for their research. For more information, please visit the NYU Shanghai Ph.D. page .

logo

  • Mission and Goals
  • DEI Commitment and Resources
  • In Memoriam
  • The Halıcıoğlu Challenge
  • 5-Year Report
  • Administration
  • Visiting Scholars
  • Founding Faculty
  • Artificial Intelligence and Machine Learning
  • Biomedical Data Science
  • Data Infrastructure and Systems
  • Data Science for Scientific Discovery
  • Data and Society
  • Theoretical Foundations of Data Science
  • Visiting Scholar Program
  • MS / PhD Admissions
  • MSDS Course Requirements
  • Degree Questions
  • PhD Course Requirements
  • PhD Student Resources
  • Research Rotation
  • Spring Evaluation Requirements
  • Course Descriptions
  • Course Offerings
  • Career Services
  • Graduate Advising
  • Online Masters Program
  • Academic Advising
  • Concurrent Enrollment
  • Course Descriptions and Prerequisites
  • Enrolling in Classes
  • Financial Opportunities
  • Major Requirements
  • Minor Requirements
  • OSD Accommodations
  • Petition Instructions
  • Student Representatives
  • Capped Major Application
  • Prospective Double Majors
  • Prospective First-Year Students
  • Prospective Transfer Students
  • Partnership Programs
  • Research Collaboration
  • Access to Talent
  • Professional Development
  • UCTV Data Science Channel
  • Alumni Relations
  • Giving Back

Give us a call or drop by anytime, we endeavor to answer all inquiries within 24 hours.

map

PO Box 16122 Collins Street West Victoria, Australia

[email protected] / [email protected]

Phone support

Phone: + (066) 0760 0260 / + (057) 0760 0560

PhD Program

Requirements for doctor of philosophy (ph.d.) in data science.

The goal of the doctoral program is to create leaders in the field of Data Science who will lay the foundation and expand the boundaries of knowledge in the field. The doctoral program aims to provide a research-oriented education to students, teaching them knowledge, skills and awareness required to perform data driven research, and enabling them to, using this shared background, carry out research that expands the boundaries of knowledge in Data Science. The doctoral program spans from foundational aspects, including computational methods, machine learning, mathematical models and statistical analysis, to applications in data science.

Course Requirements

There are three categories of courses: Foundation (group A), Core (group B), and Elective and Research requirements (group C) for the graduate program. These course requirements are intended to ensure that students are exposed to (1) fundamental concepts and tools (Foundation), (2) advanced, up-to-date views in topics central to Data Science for all students (Core), and (3) a deep, current view of their research or application are (Elective). Courses may not fulfill more than one requirement.

The doctoral program is structured as a total of  52 units in courses from these group A, B, and C as described in detail here . Out of the 52 units, 48 units (or 12 courses) must be taken for letter grade and at least 40 units must be using graduate-level courses.

The remaining 4 (= 52 – 48) units are for  professional preparation , consisting of 1 unit of faculty research seminar, 2 units of TA/tutor training and 1 unit of survival skills course taken for a passing (satisfactory) grade. Finally, as mentioned earlier, out of the 12 regular courses, at least 10 must be graduate-level courses; at most two can be upper-level undergraduate courses. 36 units or 9 courses must be completed within six quarters from the start of the degree program.

Research Rotation Program

Research rotations provide the opportunity for first-year PhD students to obtain research experience in data analysis under the guidance of HDSI affiliated faculty members. Through the rotations, students can identify a faculty member under whose supervision their dissertation research will be completed.

A research rotation is a guided research experience lasting one quarter (10 weeks) obtained by registering for DSC 294 with an instructor. All Ph.D. students will participate in a  minimum of 2 research rotations during their first year , and with a minimum of two different faculty members, and as much as three rotations including summer quarter. A student may rotate twice under the same faculty member as long as they rotate with at least two faculty members. The goal is to expose students to new methodological approaches or domain knowledge and help them find an advisor for their Ph.D. research.

Please refer to the HDSI Faculty page for a list of HDSI faculty members and their research interests. Students should reach out to faculty members to learn more about potential rotation opportunities.

An important goal of HDSI is to foster interdisciplinary collaboration. Specifically, collaboration between method researchers and domain researchers. Domain researchers specialize in a particular scientific field and collect or generate large datasets. Domain experts can partner with HDSI faculty and jointly advise a PhD. student. The site matchpoint.com was designed to facilitate the creation of such collaborations.

Outcome of each rotation program: A written report, or curated datasets with jupyter notebooks. Students are expected to work 10 hours per week.

Research rotations must be completed by the end of the Fall Quarter of the second year with a signed commitment form from a faculty advisor. Those who fail to identify a research advisor shall be advised to leave the doctoral program with an optional assessment for completion of a terminal MS-DS degree.

Preliminary Assessment Examination

The goal of the preliminary assessment examination is to assess students’ preparation for pursuing a PhD in data science, in terms of core knowledge and readiness for conducting research. The preliminary assessment is an  advisory  examination.

There is an oral presentation that needs to be completed before the end of Fall quarter of the second year. The student will propose the topic of the presentation (e.g., the outcome of a research rotation or a literature survey). The Graduate Program Committee will set up a committee consisting of two members (students can suggest committee members but there is no guarantee that only those suggested faculty members will be chosen for the committee). The oral presentation from the student will be followed by a Q&A session by the committee members. The committee will assess both the oral presentation as well as the students academic performance so far (especially in the required core courses). Students who did not get a satisfactory evaluation will receive a recommendation from the Graduate Program Committee regarding ways to remedy the lacking preparation (e.g, suggestion of courses to be taken), or an opportunity to receive a terminal MS in Data Science degree provided the student can meet the degree requirements of the MS program.

Research Qualifying Examination and Advancing to Candidacy

A research qualifying examination (UQE) is conducted by the dissertation committee consisting of four or more members approved by the graduate division as per senate regulation 715(D). One senate faculty member must have a primary appointment in the department outside of HDSI. Faculty with 25% or less partial appointment in HDSI may be considered for meeting this requirement on an exceptional basis upon approval from the graduate division.

The goal of UQE is to assess the ability of the candidate to perform independent critical research as evidenced by a presentation and writing a technical report at the level of a peer-reviewed journal or conference publication. The examination is taken after the student and his or her adviser have identified a topic for the dissertation and an initial demonstration of feasible progress has been made. The candidate is expected to describe his or her accomplishments to date as well as future work. The research qualifying examination must be completed no later than fourth year or 12 quarters from the start of the degree program; the UQE is tantamount to the advancement to PhD candidacy exam.

A petition to the Graduate Committee is required for students who take UQE after the required 12 quarters deadline. Students who fail the research qualifying examination may file a petition to retake it; if the petition is approved, they will be allowed to retake it one (and only one) more time. Students who fail UQE may also petition to transition to a MS in Data Science track.

Dissertation Defense Examination and Thesis Requirements

Students must successfully complete a final dissertation defense oral presentation and examination to the Dissertation Committee consisting of four or more members approved by the graduate division as per senate regulation 715(D). The primary Thesis Adviser, who will chair the Dissertation Committee, must be a senate faculty member with an appointment of 0% or more at HDSI. One senate faculty member in the Dissertation Committee must have a primary appointment in a department outside of HDSI. Partially appointed faculty in HDSI (at 25% or less) are acceptable in meeting this outside-department requirement as long as their main (lead) department is not HDSI.

A dissertation in the scope of Data Science is required of every candidate for the PhD degree. HDSI PhD program thesis requirements must meet Regulation 715(D) requirements. The final form of the dissertation document must comply with published guidelines by the Graduate Division.

The dissertation topic will be selected by the student, under the advice and guidance of Thesis Adviser and the Dissertation Committee. The dissertation must contain an original contribution of quality that would be acceptable for publication in the academic literature that either extends the theory or methodology of data science, or uses data science methods to solve a scientific problem in applied disciplines.

The entire dissertation committee will conduct a final oral examination, which will deal primarily with questions arising out of the relationship of the dissertation to the field of Data Science. The final examination will be conducted in two parts. The first part consists of a presentation by the candidate followed by a brief period of questions pertaining to the presentation; this part of the examination is open to the public. The second part of the examination will immediately follow the first part; this is a closed session between the student and the committee and will consist of a period of questioning by the committee members.

Special Requirements: Generalization, Reproducibility and Responsibility A candidate for doctoral degree in data science is expected to demonstrate evidence of generalization skills as well as evidence of reproducibility in research results. Evidence of generalization skills may be in the form of — but not limited to — generalization of results arrived at across domains, or across applications within a domain, generalization of applicability of method(s) proposed, or generalization of thesis conclusions rooted in formal or mathematical proof or quantitative reasoning supported by robust statistical measures. Reproducibility requirement may be satisfied by additional supplementary material consisting of code and data repository. The dissertation will also be reviewed for responsible use of data.

Special Requirements: Professional Training and Communications

All graduate students in the doctoral program are required to complete at least one quarter of experience in the classroom as teaching assistants regardless of their eventual career goals. Effective communications and ability to explain deep technical subjects is considered a key measure of a well-rounded doctoral education. Thus, Ph.D. students are also required to take a 1-unit DSC 295 (Academia Survival Skills) course for a Satisfactory grade.

Obtaining an MS in Data Science

PhD students may obtain an MS Degree in Data Science along the way or a terminal MS degree, provided they complete the requirements for the MS degree.

  • Warning : Invalid argument supplied for foreach() in /home/customer/www/opendatascience.com/public_html/wp-includes/nav-menu.php on line 95 Warning : array_merge(): Expected parameter 2 to be an array, null given in /home/customer/www/opendatascience.com/public_html/wp-includes/nav-menu.php on line 102
  • ODSC EUROPE
  • AI+ Training
  • Speak at ODSC

phd thesis data science

  • Data Visualization
  • Machine Learning
  • Deep Learning
  • Downloadable Guide
  • NLP/Text Analytics
  • Generative AI
  • Write for us
  • ODSC Community Slack Channel
  • ODSC Medium Publication
  • Speaker Blogs
  • Guest Contributors
  • AI and Data Science News
  • Research in academia
  • Upcoming Webinars

17 Compelling Machine Learning Ph.D. Dissertations

17 Compelling Machine Learning Ph.D. Dissertations

Machine Learning Modeling Research posted by Daniel Gutierrez, ODSC August 12, 2021 Daniel Gutierrez, ODSC

Working in the field of data science, I’m always seeking ways to keep current in the field and there are a number of important resources available for this purpose: new book titles, blog articles, conference sessions, Meetups, webinars/podcasts, not to mention the gems floating around in social media. But to dig even deeper, I routinely look at what’s coming out of the world’s research labs. And one great way to keep a pulse for what the research community is working on is to monitor the flow of new machine learning Ph.D. dissertations. Admittedly, many such theses are laser-focused and narrow, but from previous experience reading these documents, you can learn an awful lot about new ways to solve difficult problems over a vast range of problem domains. 

In this article, I present a number of hand-picked machine learning dissertations that I found compelling in terms of my own areas of interest and aligned with problems that I’m working on. I hope you’ll find a number of them that match your own interests. Each dissertation may be challenging to consume but the process will result in hours of satisfying summer reading. Enjoy!

Please check out my previous data science dissertation round-up article . 

1. Fitting Convex Sets to Data: Algorithms and Applications

This machine learning dissertation concerns the geometric problem of finding a convex set that best fits a given data set. The overarching question serves as an abstraction for data-analytical tasks arising in a range of scientific and engineering applications with a focus on two specific instances: (i) a key challenge that arises in solving inverse problems is ill-posedness due to a lack of measurements. A prominent family of methods for addressing such issues is based on augmenting optimization-based approaches with a convex penalty function so as to induce a desired structure in the solution. These functions are typically chosen using prior knowledge about the data. The thesis also studies the problem of learning convex penalty functions directly from data for settings in which we lack the domain expertise to choose a penalty function. The solution relies on suitably transforming the problem of learning a penalty function into a fitting task; and (ii) the problem of fitting tractably-described convex sets given the optimal value of linear functionals evaluated in different directions.

2. Structured Tensors and the Geometry of Data

This machine learning dissertation analyzes data to build a quantitative understanding of the world. Linear algebra is the foundation of algorithms, dating back one hundred years, for extracting structure from data. Modern technologies provide an abundance of multi-dimensional data, in which multiple variables or factors can be compared simultaneously. To organize and analyze such data sets we can use a tensor , the higher-order analogue of a matrix. However, many theoretical and practical challenges arise in extending linear algebra to the setting of tensors. The first part of the thesis studies and develops the algebraic theory of tensors. The second part of the thesis presents three algorithms for tensor data. The algorithms use algebraic and geometric structure to give guarantees of optimality.

3. Statistical approaches for spatial prediction and anomaly detection

This machine learning dissertation is primarily a description of three projects. It starts with a method for spatial prediction and parameter estimation for irregularly spaced, and non-Gaussian data. It is shown that by judiciously replacing the likelihood with an empirical likelihood in the Bayesian hierarchical model, approximate posterior distributions for the mean and covariance parameters can be obtained. Due to the complex nature of the hierarchical model, standard Markov chain Monte Carlo methods cannot be applied to sample from the posterior distributions. To overcome this issue, a generalized sequential Monte Carlo algorithm is used. Finally, this method is applied to iron concentrations in California. The second project focuses on anomaly detection for functional data; specifically for functional data where the observed functions may lie over different domains. By approximating each function as a low-rank sum of spline basis functions the coefficients will be compared for each basis across each function. The idea being, if two functions are similar then their respective coefficients should not be significantly different. This project concludes with an application of the proposed method to detect anomalous behavior of users of a supercomputer at NREL. The final project is an extension of the second project to two-dimensional data. This project aims to detect location and temporal anomalies from ground motion data from a fiber-optic cable using distributed acoustic sensing (DAS). 

4. Sampling for Streaming Data

Advances in data acquisition technology pose challenges in analyzing large volumes of streaming data. Sampling is a natural yet powerful tool for analyzing such data sets due to their competent estimation accuracy and low computational cost. Unfortunately, sampling methods and their statistical properties for streaming data, especially streaming time series data, are not well studied in the literature. Meanwhile, estimating the dependence structure of multidimensional streaming time-series data in real-time is challenging. With large volumes of streaming data, the problem becomes more difficult when the multidimensional data are collected asynchronously across distributed nodes, which motivates us to sample representative data points from streams. This machine learning dissertation proposes a series of leverage score-based sampling methods for streaming time series data. The simulation studies and real data analysis are conducted to validate the proposed methods. The theoretical analysis of the asymptotic behaviors of the least-squares estimator is developed based on the subsamples.

5.  Statistical Machine Learning Methods for Complex, Heterogeneous Data

This machine learning dissertation develops statistical machine learning methodology for three distinct tasks. Each method blends classical statistical approaches with machine learning methods to provide principled solutions to problems with complex, heterogeneous data sets. The first framework proposes two methods for high-dimensional shape-constrained regression and classification. These methods reshape pre-trained prediction rules to satisfy shape constraints like monotonicity and convexity. The second method provides a nonparametric approach to the econometric analysis of discrete choice. This method provides a scalable algorithm for estimating utility functions with random forests, and combines this with random effects to properly model preference heterogeneity. The final method draws inspiration from early work in statistical machine translation to construct embeddings for variable-length objects like mathematical equations

6. Topics in Multivariate Statistics with Dependent Data

This machine learning dissertation comprises four chapters. The first is an introduction to the topics of the dissertation and the remaining chapters contain the main results. Chapter 2 gives new results for consistency of maximum likelihood estimators with a focus on multivariate mixed models. The presented theory builds on the idea of using subsets of the full data to establish consistency of estimators based on the full data. The theory is applied to two multivariate mixed models for which it was unknown whether maximum likelihood estimators are consistent. In Chapter 3 an algorithm is proposed for maximum likelihood estimation of a covariance matrix when the corresponding correlation matrix can be written as the Kronecker product of two lower-dimensional correlation matrices. The proposed method is fully likelihood-based. Some desirable properties of separable correlation in comparison to separable covariance are also discussed. Chapter 4 is concerned with Bayesian vector auto-regressions (VARs). A collapsed Gibbs sampler is proposed for Bayesian VARs with predictors and the convergence properties of the algorithm are studied. 

7.  Model Selection and Estimation for High-dimensional Data Analysis

In the era of big data, uncovering useful information and hidden patterns in the data is prevalent in different fields. However, it is challenging to effectively select input variables in data and estimate their effects. The goal of this machine learning dissertation is to develop reproducible statistical approaches that provide mechanistic explanations of the phenomenon observed in big data analysis. The research contains two parts: variable selection and model estimation. The first part investigates how to measure and interpret the usefulness of an input variable using an approach called “variable importance learning” and builds tools (methodology and software) that can be widely applied. Two variable importance measures are proposed, a parametric measure SOIL and a non-parametric measure CVIL, using the idea of a model combining and cross-validation respectively. The SOIL method is theoretically shown to have the inclusion/exclusion property: When the model weights are properly around the true model, the SOIL importance can well separate the variables in the true model from the rest. The CVIL method possesses desirable theoretical properties and enhances the interpretability of many mysterious but effective machine learning methods. The second part focuses on how to estimate the effect of a useful input variable in the case where the interaction of two input variables exists. Investigated is the minimax rate of convergence for regression estimation in high-dimensional sparse linear models with two-way interactions, and construct an adaptive estimator that achieves the minimax rate of convergence regardless of the true heredity condition and the sparsity indices.

https://odsc.com/california/#register

8.  High-Dimensional Structured Regression Using Convex Optimization

While the term “Big Data” can have multiple meanings, this dissertation considers the type of data in which the number of features can be much greater than the number of observations (also known as high-dimensional data). High-dimensional data is abundant in contemporary scientific research due to the rapid advances in new data-measurement technologies and computing power. Recent advances in statistics have witnessed great development in the field of high-dimensional data analysis. This machine learning dissertation proposes three methods that study three different components of a general framework of the high-dimensional structured regression problem. A general theme of the proposed methods is that they cast a certain structured regression as a convex optimization problem. In so doing, the theoretical properties of each method can be well studied, and efficient computation is facilitated. Each method is accompanied by a thorough theoretical analysis of its performance, and also by an R package containing its practical implementation. It is shown that the proposed methods perform favorably (both theoretically and practically) compared with pre-existing methods.

9. Asymptotics and Interpretability of Decision Trees and Decision Tree Ensembles

Decision trees and decision tree ensembles are widely used nonparametric statistical models. A decision tree is a binary tree that recursively segments the covariate space along the coordinate directions to create hyper rectangles as basic prediction units for fitting constant values within each of them. A decision tree ensemble combines multiple decision trees, either in parallel or in sequence, in order to increase model flexibility and accuracy, as well as to reduce prediction variance. Despite the fact that tree models have been extensively used in practice, results on their asymptotic behaviors are scarce. This machine learning dissertation presents analyses on tree asymptotics in the perspectives of tree terminal nodes, tree ensembles, and models incorporating tree ensembles respectively. The study introduces a few new tree-related learning frameworks which provides provable statistical guarantees and interpretations. A study on the Gini index used in the greedy tree building algorithm reveals its limiting distribution, leading to the development of a test of better splitting that helps to measure the uncertain optimality of a decision tree split. This test is combined with the concept of decision tree distillation, which implements a decision tree to mimic the behavior of a block box model, to generate stable interpretations by guaranteeing a unique distillation tree structure as long as there are sufficiently many random sample points. Also applied is mild modification and regularization to the standard tree boosting to create a new boosting framework named Boulevard. Also included is an integration of two new mechanisms: honest trees , which isolate the tree terminal values from the tree structure, and adaptive shrinkage , which scales the boosting history to create an equally weighted ensemble. This theoretical development provides the prerequisite for the practice of statistical inference with boosted trees. Lastly, the thesis investigates the feasibility of incorporating existing semi-parametric models with tree boosting. 

10. Bayesian Models for Imputing Missing Data and Editing Erroneous Responses in Surveys

This dissertation develops Bayesian methods for handling unit nonresponse, item nonresponse, and erroneous responses in large-scale surveys and censuses containing categorical data. The focus is on applications of nested household data where individuals are nested within households and certain combinations of the variables are not allowed, such as the U.S. Decennial Census, as well as surveys subject to both unit and item nonresponse, such as the Current Population Survey.

11. Localized Variable Selection with Random Forest  

Due to recent advances in computer technology, the cost of collecting and storing data has dropped drastically. This makes it feasible to collect large amounts of information for each data point. This increasing trend in feature dimensionality justifies the need for research on variable selection. Random forest (RF) has demonstrated the ability to select important variables and model complex data. However, simulations confirm that it fails in detecting less influential features in presence of variables with large impacts in some cases. This dissertation proposes two algorithms for localized variable selection: clustering-based feature selection (CBFS) and locally adjusted feature importance (LAFI). Both methods aim to find regions where the effects of weaker features can be isolated and measured. CBFS combines RF variable selection with a two-stage clustering method to detect variables where their effect can be detected only in certain regions. LAFI, on the other hand, uses a binary tree approach to split data into bins based on response variable rankings, and implements RF to find important variables in each bin. Larger LAFI is assigned to variables that get selected in more bins. Simulations and real data sets are used to evaluate these variable selection methods. 

12. Functional Principal Component Analysis and Sparse Functional Regression

The focus of this dissertation is on functional data which are sparsely and irregularly observed. Such data require special consideration, as classical functional data methods and theory were developed for densely observed data. As is the case in much of functional data analysis, the functional principal components (FPCs) play a key role in current sparse functional data methods via the Karhunen-Loéve expansion. Thus, after a review of relevant background material, this dissertation is divided roughly into two parts, the first focusing specifically on theoretical properties of FPCs, and the second on regression for sparsely observed functional data.

13. Essays In Causal Inference: Addressing Bias In Observational And Randomized Studies Through Analysis And Design

In observational studies, identifying assumptions may fail, often quietly and without notice, leading to biased causal estimates. Although less of a concern in randomized trials where treatment is assigned at random, bias may still enter the equation through other means. This dissertation has three parts, each developing new methods to address a particular pattern or source of bias in the setting being studied. The first part extends the conventional sensitivity analysis methods for observational studies to better address patterns of heterogeneous confounding in matched-pair designs. The second part develops a modified difference-in-difference design for comparative interrupted time-series studies. The method permits partial identification of causal effects when the parallel trends assumption is violated by an interaction between group and history. The method is applied to a study of the repeal of Missouri’s permit-to-purchase handgun law and its effect on firearm homicide rates. The final part presents a study design to identify vaccine efficacy in randomized control trials when there is no gold standard case definition. The approach augments a two-arm randomized trial with natural variation of a genetic trait to produce a factorial experiment. 

14. Bayesian Shrinkage: Computation, Methods, and Theory

Sparsity is a standard structural assumption that is made while modeling high-dimensional statistical parameters. This assumption essentially entails a lower-dimensional embedding of the high-dimensional parameter thus enabling sound statistical inference. Apart from this obvious statistical motivation, in many modern applications of statistics such as Genomics, Neuroscience, etc. parameters of interest are indeed of this nature. For over almost two decades, spike and slab type priors have been the Bayesian gold standard for modeling of sparsity. However, due to their computational bottlenecks, shrinkage priors have emerged as a powerful alternative. This family of priors can almost exclusively be represented as a scale mixture of Gaussian distribution and posterior Markov chain Monte Carlo (MCMC) updates of related parameters are then relatively easy to design. Although shrinkage priors were tipped as having computational scalability in high-dimensions, when the number of parameters is in thousands or more, they do come with their own computational challenges. Standard MCMC algorithms implementing shrinkage priors generally scale cubic in the dimension of the parameter making real-life application of these priors severely limited. 

The first chapter of this dissertation addresses this computational issue and proposes an alternative exact posterior sampling algorithm complexity of which that linearly in the ambient dimension. The algorithm developed in the first chapter is specifically designed for regression problems. The second chapter develops a Bayesian method based on shrinkage priors for high-dimensional multiple response regression. Chapter three chooses a specific member of the shrinkage family known as the horseshoe prior and studies its convergence rates in several high-dimensional models. 

15.  Topics in Measurement Error Analysis and High-Dimensional Binary Classification

This dissertation proposes novel methods to tackle two problems: the misspecified model with measurement error and high-dimensional binary classification, both have a crucial impact on applications in public health. The first problem exists in the epidemiology practice. Epidemiologists often categorize a continuous risk predictor since categorization is thought to be more robust and interpretable, even when the true risk model is not a categorical one. Thus, their goal is to fit the categorical model and interpret the categorical parameters. The second project considers the problem of high-dimensional classification between the two groups with unequal covariance matrices. Rather than estimating the full quadratic discriminant rule, it is proposed to perform simultaneous variable selection and linear dimension reduction on original data, with the subsequent application of quadratic discriminant analysis on the reduced space. Further, in order to support the proposed methodology, two R packages were developed, CCP and DAP, along with two vignettes as long-format illustrations for their usage.

16. Model-Based Penalized Regression

This dissertation contains three chapters that consider penalized regression from a model-based perspective, interpreting penalties as assumed prior distributions for unknown regression coefficients. The first chapter shows that treating a lasso penalty as a prior can facilitate the choice of tuning parameters when standard methods for choosing the tuning parameters are not available, and when it is necessary to choose multiple tuning parameters simultaneously. The second chapter considers a possible drawback of treating penalties as models, specifically possible misspecification. The third chapter introduces structured shrinkage priors for dependent regression coefficients which generalize popular independent shrinkage priors. These can be useful in various applied settings where many regression coefficients are not only expected to be nearly or exactly equal to zero, but also structured.

17. Topics on Least Squares Estimation

This dissertation revisits and makes progress on some old but challenging problems concerning least squares estimation, the work-horse of supervised machine learning. Two major problems are addressed: (i) least squares estimation with heavy-tailed errors, and (ii) least squares estimation in non-Donsker classes. For (i), this problem is studied both from a worst-case perspective, and a more refined envelope perspective. For (ii), two case studies are performed in the context of (a) estimation involving sets and (b) estimation of multivariate isotonic functions. Understanding these particular aspects of least squares estimation problems requires several new tools in the empirical process theory, including a sharp multiplier inequality controlling the size of the multiplier empirical process, and matching upper and lower bounds for empirical processes indexed by non-Donsker classes.

How to Learn More about Machine Learning

At our upcoming event this November 16th-18th in San Francisco,  ODSC West 2021  will feature a plethora of talks, workshops, and training sessions on machine learning and machine learning research. You can  register now for 50% off all ticket types  before the discount drops to 40% in a few weeks. Some  highlighted sessions on machine learning  include:

  • Towards More Energy-Efficient Neural Networks? Use Your Brain!: Olaf de Leeuw | Data Scientist | Dataworkz
  • Practical MLOps: Automation Journey: Evgenii Vinogradov, PhD | Head of DHW Development | YooMoney
  • Applications of Modern Survival Modeling with Python: Brian Kent, PhD | Data Scientist | Founder The Crosstab Kite
  • Using Change Detection Algorithms for Detecting Anomalous Behavior in Large Systems: Veena Mendiratta, PhD | Adjunct Faculty, Network Reliability and Analytics Researcher | Northwestern University

Sessions on MLOps:

  • Tuning Hyperparameters with Reproducible Experiments: Milecia McGregor | Senior Software Engineer | Iterative
  • MLOps… From Model to Production: Filipa Peleja, PhD | Lead Data Scientist | Levi Strauss & Co
  • Operationalization of Models Developed and Deployed in Heterogeneous Platforms: Sourav Mazumder | Data Scientist, Thought Leader, AI & ML Operationalization Leader | IBM
  • Develop and Deploy a Machine Learning Pipeline in 45 Minutes with Ploomber: Eduardo Blancas | Data Scientist | Fidelity Investments

Sessions on Deep Learning:

  • GANs: Theory and Practice, Image Synthesis With GANs Using TensorFlow: Ajay Baranwal | Center Director | Center for Deep Learning in Electronic Manufacturing, Inc
  • Machine Learning With Graphs: Going Beyond Tabular Data: Dr. Clair J. Sullivan | Data Science Advocate | Neo4j
  • Deep Dive into Reinforcement Learning with PPO using TF-Agents & TensorFlow 2.0: Oliver Zeigermann | Software Developer | embarc Software Consulting GmbH
  • Get Started with Time-Series Forecasting using the Google Cloud AI Platform: Karl Weinmeister | Developer Relations Engineering Manager | Google

phd thesis data science

Daniel Gutierrez, ODSC

Daniel D. Gutierrez is a practicing data scientist who’s been working with data long before the field came in vogue. As a technology journalist, he enjoys keeping a pulse on this fast-paced industry. Daniel is also an educator having taught data science, machine learning and R classes at the university level. He has authored four computer industry books on database and data science technology, including his most recent title, “Machine Learning and Data Science: An Introduction to Statistical Learning Methods with R.” Daniel holds a BS in Mathematics and Computer Science from UCLA.

east square

ODSC’s AI Weekly Recap: Week of December 8th

AI Weekly Recap AI and Data Science News posted by ODSC Team Dec 8, 2023 Artificial intelligence has been moving at the speed of light with all of the news that...

Photo Mosaics with Nearest Neighbors: Machine Learning for Digital Art

Photo Mosaics with Nearest Neighbors: Machine Learning for Digital Art

Data Visualization posted by ODSC Community Dec 8, 2023 Here’s an example! Zoom in animation, Recommend watching in full screen. (Video by author) Technological innovation...

The Three Essential Methods to Evaluate a New Language Model

The Three Essential Methods to Evaluate a New Language Model

Modeling posted by ODSC Community Dec 7, 2023 New LLMs are released every week, and if you’re like me, you might ask yourself: Does...

SAS

Doctor of Philosophy in Data Science

Developing future pioneers in data science

The School of Data Science at the University of Virginia is committed to educating the next generation of data science leaders. The Ph.D. in Data Science is designed to impart the skills and knowledge necessary to enable research and discovery in data science methods. Because the end goal is to extract knowledge and enable discovery from complex data, the program also boasts robust applied training that is geared toward interdisciplinary collaboration. Doctoral candidates will master the computational and mathematical foundations of data science, and develop competencies in data engineering, software development, data policy and ethics. 

Doctoral students in our program apprentice with faculty and pursue advanced research in an interdisciplinary, collaborative environment that is often focused on scientific discovery via data science methods. By serving as teaching assistants for the School’s undergraduate and graduate programs, they learn to be adroit educators and hone their critical thinking and communication skills.

LEARNING OUTCOMES

Pursuing a Ph.D. in Data Science will prepare you to become an expert in the field and work at the cutting edge of a new discipline. According to LinkedIn’s most recent Emerging Jobs Report, data science is booming and data scientist is one of the top three fastest growing jobs. A Ph.D. in Data Science from the University of Virginia opens career paths in academia, industry or government. Graduates of our program will:

  • Understand data as a generic concept, and how data encodes and captures information
  • Be fluent in modern data engineering techniques, and work with complex and large data sets
  • Recognize ethical and legal issues relevant to data analytics and their impact on society 
  • Develop innovative computational algorithms and novel statistical methods that transform data into knowledge
  • Collaborate with research teams from a wide array of scientific fields 
  • Effectively communicate methods and results to a variety of audiences and stakeholders
  • Recognize the broad applicability of data science methods and models 

Graduates of the Ph.D. in Data Science will have contributed novel methodological research to the field of data science, demonstrated their work has impactful interdisciplinary applications and defended their methods in an open forum.

Bryan Christ

A Week in the Life: First-Year Ph.D. Student

Jade Preston

Ph.D. Student Profile: Jade Preston

Beau LeBlond

Ph.D. Student Profile: Beau LeBlond

Get the latest news.

Subscribe to receive updates from the School of Data Science.

  • Prospective Student
  • School of Data Science Alumnus
  • UVA Affiliate
  • Industry Member

PhD in Data Science

First Year Requirements

The standard first-year program requires students to complete nine courses: four required courses (1-4 below); one elective either in mathematical foundations or scalability and computing (pick from either 5 or 6); and finally four other electives that can come from proposed courses in data science or existing graduate courses in Computer Science or Statistics. Some students, after consulting with the committee graduate advisor, might decide to take the nine courses over the first two years.

Required courses:

  • Foundations of Machine Learning and AI Part 1
  • Responsible Use of Data and Algorithms
  • Data Interaction
  • Systems for Data and Computers/Data Design
  • Foundations of Machine Learning and AI Part 2 
  • Data Engineering and Scalable Computing

Synthesis project

Students will take courses during the first two years after which they focus primarily on their research. A milestone in this transition is completion of a synthesis project before the end of the second year in the program. Thesis projects can be done in partnership with any of DSI affiliates, and aims to meaningfully connect PhD students to their chosen focus areas.

Thesis Advisor and Dissertation Committee

Students typically select a thesis advisor by the beginning of their second year. By the end of the third year, each PhD student, after consultation with their advisor, shall establish a thesis committee of at least three faculty members, including the advisor, with at least half of the members coming from the Committee on Data Science.

Proposal Presentation and Admission to Candidacy

By the end of the third year, students should have scheduled and completed a proposal presentation to their committee, in order to be advanced to candidacy. The proposal presentation is typically an hourlong meeting that begins with a 30-minute presentation by the student, followed by a question and discussion period with the committee.

Dissertation Defense

The PhD degree will be awarded following a successful defense and the electronic submission of the final version of the dissertation to the University’s Dissertation Office.

Some of the content on this website requires JavaScript to be enabled in your web browser to function as intended. This includes, but is not limited to: navigation, video, image galleries, etc. While the website is still usable without JavaScript, it should be enabled to enjoy the full interactive experience.

picture of a multicolored composite drawing of a map of the earth

  • About Schmid
  • Undergraduate Programs
  • Graduate Programs
  • Integrated Degree Programs
  • Student Organizations
  • Support Schmid
  • Events & Conferences

» Ph.D. in Computational and Data Sciences

Computational Science is the art of creating, developing, and validating models in order to gain a profound understanding of real-life complex problems. Data Science is the art of generating insight, knowledge, and predictions by applying modern methods to large datasets.  

In Chapman University’s Ph.D. in Computational and Data Sciences program, you will collaborate on innovative research as you work closely with   nationally and internationally renowned   faculty mentors who will help prepare you to thrive in a variety of professional settings, from academia to private industry, scientific research labs to government agencies. You will learn to design and implement mathematical models and refine quantitative analysis techniques to solve complex scientific problems. Develop your dissertation with focus on advancement of theory and applications of statistical, machine learning and AI in diverse data science related fields such as medicine and epidemiology, climate and Earth hazards, big data and high-performance computing, drug design, genetics, natural language processing, bioinformatics and biotechnology, economics, and sports analytics. 

Employment and Future Opportunities

In our tech-driven world, employers are increasingly recognizing the value of data science professionals.  According to U.S. News and World Report, the Bureau of Labor Statistics projects 35.8% employment growth for data scientists between 2021 and 2031. In this period, an estimated 40,500 jobs should open up.  

Graduates from the program have gone on to work in a variety of industries, such as:  

  • Artificial Intelligence and Machine Learning 
  • Higher Education Institutions 
  • Healthcare 
  • Entertainment Industry 
  • Government Agencies 
  • Large Tech Companies such as Amazon, Microsoft, Google, Yahoo 

Ph.D. Student Handbook

For the latest information on the current curriculum, please visit the Graduate Catalog .

Prerequisites

It is expected that students admitted to the CADS Ph.D. program will have completed substantial preparatory coursework as an undergraduate major or minor from a regionally accredited institution in one of the following disciplines, or the equivalent: Mathematics, Statistics, Computer Science, Data Science, Physics, Electrical Engineering, or Software Engineering.

Preparatory coursework must include the following courses, or the equivalent:

  • Linear Algebra
  • Multivariable Calculus
  • Differential Equations
  • Computer Programming: Data Structures preferred (R, Python, and SQL)
  • Probability and Statistics (Distributions, Confidence Intervals, Hypothesis Testing, Linear Models)

Total Credits - 70

Admission Requirements

An undergraduate degree specifically in computational science is not required for admission. The program will consider applicants from a broad range of undergraduate and master’s level science disciplines (e.g. biology, chemistry, computer science, biochemistry and molecular biology, mathematics, physics). Admission will depend on the relationship between the student’s goals and the program’s objectives as well as the likelihood that the student will benefit from the program.

1 . Prerequisite Courses

2. Application Requirements

Admission to the program may be achieved by the completion of the following requirements:

Online application for admission (including $60 non-refundable application fee)

Official transcript from degree granting institution. If prerequisite courses have been taken at schools other than the degree granting institution, those transcripts must also be submitted. Applicants must have earned a minimum grade point average of 3.00.

Graduate Admission Test Scores (School Code: 4047); the Graduate Record Examination (GRE) general test scores must have been taken within the last five years. Applicants must achieve the following minimum scores:

  • Verbal: 153
  • Quantitative: 146
  • Analytical Writing: 4.0

To request a GRE waiver, complete this form . Once you submit your waiver request, you will be notified by email in 1-2 weeks if the GRE has been waived for you. Each waiver is reviewed on a case-by-case basis.

Letters of recommendation - two letters of recommendation are required, including one from an academic source which describes your professional and academic abilities.

Statement of Intent - a 750 word essay; applicants are expected to address science topics they are interested in and how they envision applying computational science in those areas.

Resume - a resume or curriculum vitae is required

International student application requirements

Chapman's language of instruction is English. If you have not received a bachelor's degree (or higher) at an institute where English was the language of instruction, you must demonstrate English proficiency by submitting official scores from an English language exam. You can find additional information here .

Official Transcripts and Diploma

  • Your application requires official transcripts in both the native language, and in English. If your university does not provide translations of your transcript, you will need to have your transcript translated, line-by-line and word-for-word exactly. You will need to submit both the official transcript and the official translation.
  • If your university only provides one official transcript, you will need to submit a notarized copy. You will need to take your official transcript and have certified copies made, and translated into English if needed. These documents should be stamped by the legal notary who made the copy and/or translation. We do not accept uncertified copies directly from students. Please note that official documents will be required upon acceptance.
  • While your diploma will not be required with your application, your enrollment into Chapman University will be dependent upon submission of your official diploma. Should you be admitted, your diploma will need to be submitted in both the native language, and in English. You will need to submit both the official diploma and the official translation. If your university only provides one official diploma, you may send a notarized copy, or bring the original documents into our office at the time classes begin. 

Supplemental Application

  • The International Supplemental Application is the financial certification form that provides comprehensive information about your passport, I-20 requirements, and financial support for your studies. This form is required for F-1 student visa applicants.
  • Should you be admitted into our program, you will be sent information on how to access the Supplemental Application.
  • If you hold a U.S. passport, or are a permanent resident, you do not need to submit this document. You will apply as a domestic student.

Tuition Information

Financial assistance is available in the form of federal loans, department scholarships, teaching assistantships, and research assistantships.

More information can be found on the Financial Aid website or by contacting Graduate Financial Aid at [email protected]  or (714) 628-2730.

Admission – Please contact the Associate Director of International Initiatives, DSO, Monica Chen, MA, at [email protected]  regarding your application, to schedule a campus visit or for other non-program specific questions.

Application : How to Apply

International Students – View our international student admissions page for additional information regarding applying to Chapman.

Tuition - Contact Student Business Services at (714) 997-6617 for information regarding tuition, fees, billing & payments. Please note that program staff are prohibited from discussing financial information.

Federal Financial Aid - For more information, email [email protected]  or call (714) 628-2730.

Housing - For graduate student housing options, contact Housing and Residence Life at (714) 997-6603.

Q: What is required for admission to the program?

A: Please review our admissions requirements for more information. You may also contact our graduate admissions team at (714) 997-6711, or [email protected]  

Q: Am I required to take the TOEFL (or equivalent)?

A: Applicants who have completed their bachelor’s degree or higher at an institution where English was not the primary language of instruction must submit scores for an English Proficiency exam. Chapman University's institution code for the TOEFL is 4047. 

Q: Who should my letters of recommendation come from? May I submit additional letters?

A: Letters of recommendation should come from former faculty members or those you've worked with in industry who can attest to your academic and professional abilities. Two letters is recommended, but you can submit more if you wish.

Q: Can I send in transcripts to show coursework from non-degree granting institutions? 

A: Yes, all courses you have completed will be taken into account by the admission committee.

Q: Can I submit my application before I have all the necessary documents?

A: Yes, although some sections are required before submitting. Admissions will hold your application and notify us as your documents become available. You will not receive an admissions decision until all documents have been received.

Q: How many students are accepted each year?

A: The Ph.D. program accepts an average of 8 applicants each fall.

Q: Do you accept admissions on a rolling basis?

A: No, students are admitted once a year – for the following fall semester.

Q: What is the cost of the program?

A: The 22/23 cost of the Ph.D. program is $126,000 ($1,800 per credit regardless of residency). However, most students receive funding and TA opportunities.

Q: How long does the program take to complete?

A: Normative completion to the doctoral degree is 4-6 years, depending on the student’s level of preparation, research topic, and rate of publication.

Q: Am I allowed to attend part-time?

A: Yes, although part-time Ph.D. students are expected to provide their own funding.

Q: Is this program online?

A: No, this program is not online and does not offer any hybrid courses. 

Q: When are classes offered?

A: Most courses are offered in the afternoons and evenings.

Q: Can I transfer courses?

A: Up to 18 credits may be accepted as transfer credit. We accept both standard and online courses that meet all transfer requirements and are from regionally accredited schools.

Q: Is there financial support available?

A: Yes, highly qualified Ph.D. applicants will be offered financial packages upon admission.

Q: Do I find out about available assistantships?

A: Students who would like to be considered for assistantships should send their CV and evaluations from any previous teaching assignments to the Program Coordinator prior to the application deadline. Please specify level of knowledge in each of the following undergraduate areas: math, physics, statistics, and/or computer science.

Q: What scholarships are available?

A: Students are encouraged to apply for external scholarships sponsored by government agencies, corporations, and foundations. Some scholarship search options are found on the Financial Aid - Outside Scholarships page .

Q: What are the housing options?

A: On-campus housing is extremely limited and graduate students are encouraged to research alternative living arrangements off-campus by visiting our Introduction to Off-Campus Living page . After being accepted to the program, you can connect to the community through Facebook Off Campus Housing and Roommate Corner and Off-Campus Housing Listings .  International students should also check with International Student & Scholar Services .  

Additional Information for International Students:

Q: Are Chapman's Computational and Data Sciences degrees STEM (Science, Technology, Engineering, Mathematics) programs?

A: Yes, students in our program are eligible to apply for STEM benefits.  See the International Student & Scholar Services for more information.  You can also contact Lisa Luu-Luc, Specialist International Student & Scholar Services, at [email protected]  or (714) 744-2110, with any questions. Q:  What is OPT? 

A:   Optional Practical Training or OPT allows you to work for one year, following graduation, in a job related to your major or field of study.  See the International Student & Scholar Services for more information.  You can also contact Lisa Luu-Luc, Specialist International Student & Scholar Services, at [email protected]  or (714) 744-2110, with any questions. Q: What is CPT?

A: Curricular Practical Training or CPT allows you to participate in an off-campus paid internship that is related to your major or field of study.  See the International Student & Scholar Services for more information.  You can also contact Lisa Luu-Luc, Specialist International Student & Scholar Services, at [email protected]  or (714) 744-2110, with any questions.

  • M.S. Food Science
  • MS Computational and Data Sciences (CADS)
  • Ph.D. Computational and Data Sciences
  • Doctor of Science in Mathematics, Philosophy and Physics

Cyril Rakovski, Ph.D. Co-Program Director [email protected] (714) 997-6945

Adrian Vajiac, Ph.D. Co-Program Director [email protected] (714) 997-6898

Matthew Martinez, MFA Graduate Program Coordinator [email protected] (714) 997-6993

Monica Chen, MA Associate Director of International Initiatives, DSO [email protected] (714) 289-3590 Graduate Financial Aid [email protected] (714) 628-2730

  • Graduate Applications
  • Financial Aid

Application Deadlines

Early Admission Deadline: December  1, 2023 Regular Deadline: January 15, 2024

Applications submitted after the deadline will be reviewed on a space-available basis.

Boston University Academics

Boston University

  • Campus Life
  • Schools & Colleges
  • Degree Programs
  • Search Academics

PhD in Computing & Data Sciences

For more information and to get in touch, please visit the Faculty of Computing & Data Sciences website .

The PhD program in Computing & Data Sciences (CDS) at Boston University prepares its graduates to make significant contributions to the art, science, and engineering of computational and data-driven processes that are woven into all aspects of society, economy, and public discourse, leading to solution of problems and synthesis of knowledge related to the methodical, generalizable, and scalable extraction of insights from data as well as the design of new information systems and products that enable actionable use of those insights to advance scholarly as well as practical pursuits in a wide range of application domains.

Applicants to the PhD program in CDS are expected to have earned a bachelor’s or master’s degree in one of the methodological or applied disciplines relating to the computational and data-driven areas of scholarship in CDS. They are expected to possess basic mathematical and computational competencies, and demonstrable propensity for cross-disciplinary work. To accommodate a diversity of student backgrounds and preparations, a holistic admission review is utilized. As such, GRE tests and scores are not required, but could be optionally provided and considered as part of the applicant’s portfolio, which may also include evidence of prior, relevant preparation, including creative works, software code repositories, etc. Special attention will be paid to applicants from underrepresented minorities in computing and data science disciplines.

Completion of the PhD degree in CDS requires coursework covering breadth and depth topics spanning the foundational, applied, and sociotechnical dimensions of computing and data science; completion of research rotations that expose students to ongoing projects; completion of a cohort-based training on ethical and responsible computing; and successful proposal and defense of a doctoral thesis.

For their thesis work, and in preparation for careers in academia, industry, and government, CDS PhD students are expected to pursue theoretical, applied, or empirical studies leading to solution of new problems and synthesis of new knowledge in a topic area determined in consultation with their mentors and collaborators, which may include external researchers and practitioners in industrial and academic research laboratories.

Upon completion of the program, students will be prepared to pursue careers in which they lead independent cutting-edge research and development agendas, whether in academia (by teaching, mentoring, and supervising teams of students engaged in scholarly pursuits) or in industry (by collaborating, directing, and effectively managing diverse teams of practitioners working at the forefront of industrial R&D).

Learning Outcomes

The following learning outcomes explain what you will be able to do at the end of your time as a CDS PhD candidate, as a result of earning your degree.

  • Exhibit a strong grasp of the principles governing the design and implementation of the methodological approaches for computational and data-driven inquiry.
  • Identify the literature and demonstrate mastery of the compendium of works relevant to a well-defined area of research inquiry in computing and data sciences.
  • Show capacity to engage meaningfully in and materially contribute to multidisciplinary research and development endeavors.
  • Evidence a strong sense of social and professional responsibility for decisions related to the development and deployment of computational and data-driven technologies.
  • Assess and argue the merits, limitations, and possibilities of new research work in a specialized area at the level commensurate with standards of scholarly venues in that area.
  • Formulate and pursue a research agenda leading to solution of new problems and to synthesis of new knowledge shared through peer-reviewed publications.

Course Requirements

Sixteen semester courses (64 credits) are required for post-BA/BS students and 12 semester courses (48 credits) are required for post-MA/MS students. Students with prior graduate work (including master’s degrees) may be able to transfer up to two courses (8 credits) as long as these credits were not used to fulfill matriculation requirements, upon the recommendation of the student’s academic advisor, and subject to approval by the Associate Provost for CDS.

Of the 16 courses, up to 3 undergraduate courses (12 credits) may be counted as background courses, selected in consultation with the student’s academic advisor and subject to approval by the Associate Provost for CDS. Other than these remedial courses, all other courses must be graduate-level courses or directed studies offered by CDS or by other BU departments in order to satisfy the following degree requirements.

The methodology core requirement ensures that students possess foundational knowledge and competencies in a subset of the following eight methodological areas of CDS:

  • Mathematical Foundations of Data Science
  • Statistical Modeling and Inference
  • Efficient and Scalable Algorithms
  • Predictive Analytics and Machine Learning
  • Combinatorial Optimization and Algorithms
  • Computational Complexity
  • Programming and Software Design
  • Large-scale Data Management

A list of courses that can be used to satisfy these competencies will be maintained on the website for CDS. Students who start their PhD program in CDS are expected to satisfy at least six of these competencies. Students who complete the course requirement for the PhD program in a cognate discipline are expected to satisfy at least four of these competencies.

The subject core requirement ensures that students establish depth in one area of inquiry that is aligned with either the methodological or applied dimensions of CDS. Subject areas are defined by groups of CDS faculty members working in related disciplinary and/or interdisciplinary areas of research who expect their prospective students to have enough depth in the subset of topics to enable them to tackle doctoral-level research in these topics. The set of subject areas as well as a list of preapproved graduate-level courses offered in CDS or elsewhere at BU that can be used to satisfy each subject area will be maintained on the website for CDS.

During the first two years in the program, all PhD candidates in CDS must complete three cohort-based requirements; namely, a two-semester training course (4 credits) covering various aspects of the responsible and ethical conduct of computational and data-driven research, a two-semester doctoral seminar (4 credits) that introduces them to the research portfolios of CDS faculty members as well as to the skills and capacities needed for success as scholars, and at least two research or lab rotations (8 credits) that expose them to real-world computational and data-driven applications that must be tackled through effective multidisciplinary teamwork.

A cumulative GPA not less than 3.3 must be maintained for all non-Pass/Fail courses taken to satisfy the methodology core requirement and the subject core requirement of the degree, excluding any background courses and excluding any transferred credits. Students who receive grades of B– or lower in any three courses taken at BU will be withdrawn from the program.

Language Requirement

There is no foreign language requirement for the PhD degree in CDS.

Qualifying Examinations

No later than the end of the sixth semester (third year), all PhD candidates in CDS must pass a public oral examination administered by a committee of three faculty members, chaired by the student’s research (and presumptive thesis) advisor or coadvisors. The oral area exam is meant to establish the student mastery of a well-defined area of scholarship and preparedness to pursue original research in that area. The oral area examination may require completion of a survey paper or completion of a pilot project ahead of the examination. The scope as well as any additional requirements needed for the examination should be developed in consultation with and approval of the research advisor(s), at least one semester prior to the exam.

Dissertation and Final Oral Examination

Candidates shall demonstrate their abilities for independent study in a dissertation representing original research or creative scholarship. A prospectus for the dissertation must be successfully defended no later than the end of the eighth semester (fourth year) of study.

Candidates must undergo a final oral examination no later than the end of the 10th semester (fifth year) of study in which they defend their dissertation as a valuable contribution to knowledge in their field and demonstrate a mastery of their field of specialization in relation to their dissertation.

Both the prospectus and final dissertation must be administered by a dissertation committee of at least three readers (including the dissertation advisor or coadvisors) and chaired by a CDS faculty member who is not one of the readers.

Related Bulletin Pages

  • Abbreviations and Symbols

Beyond the Bulletin

  • Faculty of Computing & Data Sciences
  • Data Science for Good
  • Impact Labs & Co-Labs
  • BS in Data Science
  • MS in Data Science
  • PhD in Computing & Data Sciences
  • Minor in Data Science

Terms of Use

Note that this information may change at any time. Read the full terms of use .

Accreditation

Boston University is accredited by the New England Commission of Higher Education (NECHE).

Boston University

  • © Copyright
  • Mobile Version
  • Diversity & Inclusion
  • Community Values
  • Visiting MIT Physics
  • People Directory
  • Faculty Awards
  • History of MIT Physics
  • Policies and Procedures
  • Departmental Committees
  • Academic Programs Team
  • Finance Team
  • Meet the Academic Programs Team
  • Prospective Students
  • Requirements
  • Employment Opportunities
  • Research Opportunities
  • Graduate Admissions
  • Doctoral Guidelines
  • Financial Support
  • Graduate Student Resources

PhD in Physics, Statistics, and Data Science

  • MIT LEAPS Program
  • for Undergraduate Students
  • for Graduate Students
  • Mentoring Programs Info for Faculty
  • Non-degree Programs
  • Student Awards & Honors
  • Astrophysics Observation, Instrumentation, and Experiment
  • Astrophysics Theory
  • Atomic Physics
  • Condensed Matter Experiment
  • Condensed Matter Theory
  • High Energy and Particle Theory
  • Nuclear Physics Experiment
  • Particle Physics Experiment
  • Quantum Gravity and Field Theory
  • Quantum Information Science
  • Strong Interactions and Nuclear Theory
  • Center for Theoretical Physics
  • Affiliated Labs & Centers
  • Program Founder
  • Competition
  • Donor Profiles
  • Patrons of Physics Fellows Society
  • Giving Opportunties
  • physics@mit Journal: Fall 2023 Edition
  • Events Calendar
  • Physics Colloquia
  • Search for: Search

Many PhD students in the MIT Physics Department incorporate probability, statistics, computation, and data analysis into their research. These techniques are becoming increasingly important for both experimental and theoretical Physics research, with ever-growing datasets, more sophisticated physics simulations, and the development of cutting-edge machine learning tools. The Interdisciplinary Doctoral Program in Statistics (IDPS)  is designed to provide students with the highest level of competency in 21st century statistics, enabling doctoral students across MIT to better integrate computation and data analysis into their PhD thesis research.

Admission to this program is restricted to students currently enrolled in the Physics doctoral program or another participating MIT doctoral program. In addition to satisfying all of the requirements of the Physics PhD, students take one subject each in probability, statistics, computation and statistics, and data analysis, as well as the Doctoral Seminar in Statistics, and they write a dissertation in Physics utilizing statistical methods. Graduates of the program will receive their doctoral degree in the field of “Physics, Statistics, and Data Science.”

Doctoral students in Physics may submit an Interdisciplinary PhD in Statistics Form between the end of their second semester and penultimate semester in their Physics program. The application must include an endorsement from the student’s advisor, an up-to-date CV, current transcript, and a 1-2 page statement of interest in Statistics and Data Science.

The statement of interest can be based on the student’s thesis proposal for the Physics Department, but it must demonstrate that statistical methods will be used in a substantial way in the proposed research. In their statement, applicants are encouraged to explain how specific statistical techniques would be applied in their research. Applicants should further highlight ways that their proposed research might advance the use of statistics and data science, both in their physics subfield and potentially in other disciplines. If the work is part of a larger collaborative effort, the applicant should focus on their personal contributions.

For access to the selection form or for further information, please contact the IDSS Academic Office at  [email protected] .

Required Courses

Courses in this list that satisfy the Physics PhD degree requirements can count for both programs. Other similar or more advanced courses can count towards the “Computation & Statistics” and “Data Analysis” requirements, with permission from the program co-chairs. The IDS.190 requirement may be satisfied instead by IDS.955 Practical Experience in Data, Systems, and Society, if that experience exposes the student to a diverse set of topics in statistics and data science. Making this substitution requires permission from the program co-chairs prior to doing the practical experience.

  • IDS.190 – Doctoral Seminar in Statistics and Data Science ( may be substituted by IDS.955 Practical Experience in Data, Systems and Society )
  • 6.7700[J] Fundamentals of Probability or
  • 18.675 – Theory of Probability
  • 18.655 – Mathematical Statistics or
  • 18.6501 – Fundamentals of Statistics or
  • IDS.160[J] – Mathematical Statistics: A Non-Asymptotic Approach
  • 6.C01/6.C51 – Modeling with Machine Learning: From Algorithms to Applications or
  • 6.7810 Algorithms for Inference or
  • 6.8610 (6.864) Advanced Natural Language Processing or
  • 6.7900 (6.867) Machine Learning or
  • 6.8710 (6.874) Computational Systems Biology: Deep Learning in the Life Sciences or
  • 9.520[J] – Statistical Learning Theory and Applications or
  • 16.940 – Numerical Methods for Stochastic Modeling and Inference or
  • 18.337 – Numerical Computing and Interactive Software
  • 6.8300 (6.869) Advances in Computer Vision or
  • 8.334 – Statistical Mechanics II or
  • 8.371[J] – Quantum Information Science or
  • 8.591[J] – Systems Biology or
  • 8.592[J] – Statistical Physics in Biology or
  • 8.942 – Cosmology or
  • 9.583 – Functional MRI: Data Acquisition and Analysis or
  • 16.456[J] – Biomedical Signal and Image Processing or
  • 18.367 – Waves and Imaging or
  • IDS.131[J] – Statistics, Computation, and Applications

Grade Policy

C, D, F, and O grades are unacceptable. Students should not earn more B grades than A grades, reflected by a PhysSDS GPA of ≥ 4.5. Students may be required to retake subjects graded B or lower, although generally one B grade will be tolerated.

Unless approved by the PhysSDS co-chairs, a minimum grade of B+ is required in all 12 unit courses, except IDS.190 (3 units) which requires a P grade.

Though not required, it is strongly encouraged for a member of the MIT  Statistics and Data Science Center (SDSC)  to serve on a student’s doctoral committee. This could be an SDSC member from the Physics department or from another field relevant to the proposed thesis research.

Thesis Proposal

All students must submit a thesis proposal using the standard Physics format. Dissertation research must involve the utilization of statistical methods in a substantial way.

PhysSDS Committee

  • Jesse Thaler (co-chair)
  • Mike Williams (co-chair)
  • Isaac Chuang
  • Janet Conrad
  • William Detmold
  • Philip Harris
  • Jacqueline Hewitt
  • Kiyoshi Masui
  • Leonid Mirny
  • Christoph Paus
  • Phiala Shanahan
  • Marin Soljačić
  • Washington Taylor
  • Max Tegmark

Can I satisfy the requirements with courses taken at Harvard?

Harvard CompSci 181 will count as the equivalent of MIT’s 6.867.  For the status of other courses, please contact the program co-chairs.

Can a course count both for the Physics degree requirements and the PhysSDS requirements?

Yes, this is possible, as long as the courses are already on the approved list of requirements. E.g. 8.592 can count as a breadth requirement for a NUPAX student as well as a Data Analysis requirement for the PhysSDS degree.

If I have previous experience in Probability and/or Statistics, can I test out of these requirements?

These courses are required by all of the IDPS degrees. They are meant to ensure that all students obtaining an IDPS degree share the same solid grounding in these fundamentals, and to help build a community of IDPS students across the various disciplines. Only in exceptional cases might it be possible to substitute more advanced courses in these areas.

Can I substitute a similar or more advanced course for the PhysSDS requirements?

Yes, this is possible for the “computation and statistics” and “data analysis” requirements, with permission of program co-chairs. Substitutions for the “probability” and “statistics” requirements will only be granted in exceptional cases.

For Spring 2021, the following course has been approved as a substitution for the “computation and statistics” requirement:   18.408 (Theoretical Foundations for Deep Learning) .

The following course has been approved as a substitution for the “data analysis” requirement:   6.481 (Introduction to Statistical Data Analysis) .

Can I apply for the PhysSDS degree in my last semester at MIT?

No, you must apply no later than your penultimate semester.

What does it mean to use statistical methods in a “substantial way” in one’s thesis?

The ideal case is that one’s thesis advances statistics research independent of the Physics applications. Advancing the use of statistical methods in one’s subfield of Physics would also qualify. Applying well-established statistical methods in one’s thesis could qualify, if the application is central to the Physics result. In all cases, we expect the student to demonstrate mastery of statistics and data science.

PhD in Data Science

phd thesis data science

One of the first programs of its kind, in the nation, WPI’s interdisciplinary PhD in Data Science recognizes that traditional data processing applications can no longer handle today’s large and complex datasets. New models are needed to handle big data; and knowledgeable graduates with expertise in turning those observations into meaningful recommendations are in high demand.

Value Proposition Description

You’ll be working alongside faculty and industry partners to analyze, capture, search, share, store, transfer, query, and visualize huge amounts of data to solve real-world challenges. Some broad-stroke examples:

  • using predictive analytics to identify cyber threats
  • employing big data analytics to improve healthcare outcomes
  • empowering “smart” cities to make data-driven policy changes critical for societal well-being

Applying to the Data Science PhD Program

Students applying to the data science PhD program will find WPI’s data science degree options listed with engineering, science, and mathematics on the application form.

phd thesis data science

WPI’s PhD in data science is interdisciplinary, drawing from Computer Science , Mathematical Sciences , and the Business School . Together, courses and dissertation research revolve around five key areas:

  • Integrative Data Science
  • Business Intelligence and Case Studies
  • Data Access and Management
  • Data Analytics and Mining
  • Mathematical Analytics

Chemical Engineering Professor David DiBiasio and colleagues at WPI with sponsorship from the KERN Foundation are leading efforts to introduce entrepreneurial training into WPI’s STEM curricula and project based learning.

A Ph.D. student must obtain core competency by taking 7 courses from the below list of Data Science core areas, with an A grade in 4 out of the 7 courses and at least a grade B for the remaining 3 courses,  within 2 years after starting the Ph.D. 60 program.

Integrative Data Science  (Required) DS 501. Introduction to Data Science (3 credits)

Mathematical Analytics 3 credits (Select at least one) DS 502. Statistical Methods for Data Science (3 credits) MA 542. Regression Analysis MA 554. Applied Multivariate Analysis

Data Access and Management 3 credits (Select at least one) CS 542. Database Management Systems (3 credits) MIS571. Database Applications Development DS 503. Big Data Management (3 credits) CS 561. Advanced Topics in Database Systems

Data Analytics and Mining 3 credits (Select at least one) CS 548. Knowledge Discovery and Data Mining (3 credits) DS 504. Big Data Analytics (3 credits) CS 539. Machine Learning

Business Intelligence and Case Studies 3 credits (Select at least one) MIS 584. Business Intelligence MKT 568. Data Mining Business Applications

 Diane Strong (left), professor, and Bengisu Tulu , associate professor, are leading teams developing smartphone apps that help people better manage health conditions ranging from diabetes to stress eating. The work draws on the expertise of a team of specialists, including technology experts and clinicians.

Andrew Trapp , associate professor of operations and industrial engineering at the Foisie Business School, develops analytical tools to estimate capacities for holding sites, judges, and other resources needed to humanely process migrant asylum cases at the international borders.

Software Tools & Labs

The Data Science Innovation Lab is dedicated workspace for project work by students in the Data Science program. Robust servers and computer clusters are available for experimenting with large-scale datasets throughout labs at WPI, including many interdisciplinary facilities.

State-of-the-art software programs:

phd thesis data science

Close faculty interaction, cutting-edge equipment, and personal attention let you structure your program so it suits your individual career goals. You’ll leave with a degree that will help you succeed in your distinctive path.

phd thesis data science

Data science research gives you opportunities to work on grand challenge problems with societal importance, including topics such as cybersecurity, healthcare, and sustainability.

phd thesis data science

Our data science graduate program offers expertise in computer science, statistics, and business topics while giving you essential opportunities to work with industry partners.

phd thesis data science

WPI’s innovative and multidisciplinary graduate program prepares students to become talented and effective leaders in this rapidly evolving field.

WPI faculty and candidates in the PhD in data science are exploring every aspect of this burgeoning field. Together, they’re fueling breakthroughs that have direct, real-world impact in health, genetic analysis, sustainability, educational software, financial trading, and more.

A faculty advisor will help you design a Plan of Study for your dissertation as well as coursework in the core areas of data analytics and big data computing, statistical foundations and mathematical analytics, and business intelligence and innovation.

Cassandra DB2 Hadoop IBM Cognos IBM ILOG CPLEX IBM SPSS Modeler InfoSphere Big Insights InfoSphere Streams Mahout Maple MATLAB

MySQL Oracle Server Palisade DecisionTools Suite R RapidMiner SAS Spotfire SQL Server Tableaux Weka

Faculty Profiles

Elke Rundensteiner

As founding Head of the interdisciplinary Data Science program here at WPI, I take great pleasure in doing all in my power to support the Data Science community in all its facets from research collaborations, and new educational initiatives to our innovative industry-sponsored and mentored Graduate Qualifying projects at the graduate level.

Xiangnan Kong

Professor Kong’s research interests focus on data mining and machine learning, with emphasis on addressing the data science problems in biomedical and social applications. Data today involves an increasing number of data types that need to be handled differently from conventional data records, and an increasing number of data sources that need to be fused together. Dr. Kong is particularly interested in designing algorithms to tame data variety issues in various research fields, such as biomedical research, social computing, neuroscience, and business intelligence.

Yanhua Li

Yanhua Li is an Associate Professor in the Computer Science Department and Data Science Program at Worcester Polytechnic Institute (WPI). His research interests focus on artificial intelligence (AI) and data science, with applications in smart cities in many contexts, including spatial-temporal data analytics, urban planning and optimization.

Randy Paffenroth

My research focuses on compressed sensing, machine learning, signal processing, and the interaction between mathematics, computer science and software engineering. My interests range from theoretical results to algorithms for tackling practical applied problems, and I enjoy problems most when mathematical results lead to efficient software implementations for big data. I am looking forward to working with students at all levels and backgrounds who share an interest in mathematics, software, or data.

Andrew Trapp

I am Associate Professor of Operations and Industrial Engineering at Worcester Polytechnic Institute (WPI), with courtesy professorships in Mathematical Sciences and Data Science. I hold a Ph.D. in Industrial Engineering from the University of Pittsburgh. My objective is to use science and technology to assist real human need by improving systems that serve vulnerable peoples, such as refugees and asylum seekers, survivors of human trafficking, and children in the foster care system.

phd thesis data science

Getting Involved

We’re data scientists – we use tools of the trade and big data analytics to innovate. Follow department happenings and industry trends via our social media channels on  Facebook  and  LinkedIn .

Need to Earn a Master’s First? Explore Our Pioneering MS in Data Science

Not quite ready to apply for our data science PhD program since you first need to earn a master’s? Our pioneering master’s in data science dives into how to articulate findings into how to synthesize huge amounts of data and articulate findings into innovative solutions. WPI is one of a handful of universities that offers a MS in data science. Are you a working professional and prefer to study online? Our master’s in data science online makes it possible for you to advance your expertise from wherever you are conveniently online. Our core courses dive into the same on campus data-science essentials like analysis techniques, database management, and more. Maybe you’re excited about elevating your career in big data, but have questions about PhD data science salary and popular job titles. Check out our career information for data science.

Advance Your Data Science Skills with a Graduate Certificate

Individuals who know how to interpret and harness large amounts of data are in high demand and our graduate certificates are a perfect way to customize your data science skills and aspirations. Our on campus graduate certificate in data science enables students to select six courses that dive into mathematical analytics, data management, business intelligence, and more. Maybe you have a busy work-life balance and prefer to study online? Our online data science certificate brings world-class instruction right to you with flexible course offerings that enable you to enhance your data analytics skills.

Ready to Start Your Data Science Path?

If data intrigues you and you love the idea of finding patterns and revealing the information in massive amounts of data, a future in data science likely appeals to you. If you have your sights set on a PhD in data science, you can get started on the right academic path with a bachelor’s in data science . WPI’s bachelor’s program offers hands-on projects to increase your understanding of the field while giving you real skills you can use. If you’re majoring in another field, such as business or computer science, a minor in data science will give you a solid understanding of data science concepts. With a minor in data science, you’ll gain skills and learn how to apply them to your chosen discipline.

WPI is proud to be the recipient of not one, but two National Science Foundation Research Traineeship programs. The programs provide exceptionally talented graduate students with specialized training and funding assistance to join careers at the forefront of technology and innovation. The programs are for graduate students in research-based master's and doctoral degree programs in STEM. Learn more .

The BioPoint Program for Graduate Students has been designed to complement traditional training in bioscience, digital and engineering fields. Students accepted into one of the home BioPoint programs will have the flexibility to select research advisors and take electives in other departments to broaden their skills. BioPoint curriculum is designed to be individual, interactive, project-focused and diverse, and includes innovative courses, seminars, journal clubs and industrial-based projects. Learn more .

Doctoral Program

Program summary.

Students are required to

  • master the material in the prerequisite courses ;
  • pass the first-year core program;
  • attempt all three parts of the qualifying examinations and show acceptable performance in at least two of them (end of 1st year);
  • satisfy the depth and breadth requirements (2nd/3rd/4th year);
  • successfully complete the thesis proposal meeting (winter quarter of the 3rd year);
  • present a draft of their dissertation and pass the university oral examination (4th/5th year).

The PhD requires a minimum of 135 units. Students are required to take a minimum of nine units of advanced topics courses (for depth) offered by the department (not including literature, research, consulting or Year 1 coursework), and a minimum of nine units outside of the Statistics Department (for breadth). Courses for the depth and breadth requirements must equal a combined minimum of 24 units. In addition, students must enroll in STATS 390 Statistical Consulting, taking it at least twice.

All students who have passed the qualifying exams but have not yet passed the Thesis Proposal Meeting must take STATS 319 at least once each year. For example, a student taking the qualifying exams in the summer after Year 1 and having the dissertation proposal meeting in Year 3, would take 319 in Years 2 and 3. Students in their second year are strongly encouraged to take STATS 399 with at least one faculty member. All details of program requirements can be found in our PhD handbook (available to Stanford affiliates only, using Stanford authentication. Requests for access from non-affiliates will not be approved).

Statistics Department PhD Handbook

All students are expected to abide by the Honor Code and the Fundamental Standard .

Doctoral and Research Advisors

During the first two years of the program, students' academic progress is monitored by the department's Graduate Director. Each student should meet at least once a quarter with the Graduate Director to discuss their academic plans and their progress towards choosing a thesis advisor (before the final study list deadline of spring of the second year). From the third year onward students are advised by their selected advisor.

Qualifying Examinations

Qualifying examinations are part of most PhD programs in the United States. At Stanford these exams are intended to test the student's level of knowledge when the first-year program, common to all students, has been completed. There are separate examinations in the three core subjects of statistical theory and methods, applied statistics, and probability theory, which are typically taken during the summer at the end of the student's first year. Students are expected to attempt all three examinations and show acceptable performance in at least two of them. Letter grades are not given. Qualifying exams may be taken only once. After passing the qualifying exams, students must file for Ph.D. Candidacy, a university milestone, by the end of spring quarter of their second year.

While nearly all students pass the qualifying examinations, those who do not can arrange to have their financial support continued for up to three quarters while alternative plans are made. Usually students are able to complete the requirements for the M.S. degree in Statistics in two years or less, whether or not they have passed the PhD qualifying exams.

Thesis Proposal Meeting and Dissertation Reading Committee 

The thesis proposal meeting is intended to demonstrate a student's depth in some areas of statistics, and to examine the general plan for their research. In the meeting the student gives a 60-minute presentation involving ideas developed to date and plans for completing a PhD dissertation, and for another 60 minutes answers questions posed by the committee. which consists of their advisor and two other members. The meeting must be successfully completed by the end of winter quarter of the third year. If a student does not pass, the exam must be repeated. Repeated failure can lead to a loss of financial support.

The Dissertation Reading Committee consists of the student’s advisor plus two faculty readers, all of whom are responsible for reading the full dissertation. Of these three, at least two must be members of the Statistics Department (faculty with a full or joint appointment in Statistics but excluding for this purpose those with only a courtesy or adjunct appointment). Normally, all committee members are members of the Stanford University Academic Council or are emeritus Academic Council members; the principal dissertation advisor must be an Academic Council member. 

The Doctoral Dissertation Reading Committee form should be completed and signed at the Dissertation Proposal Meeting. The form must be submitted before approval of TGR status or before scheduling a University Oral Examination.

 For further information on the Dissertation Reading Committee, please see the Graduate Academic Policies and Procedures (GAP) Handbook section 4.8.

University Oral Examinations

The oral examination consists of a public, approximately 60-minute, presentation on the thesis topic, followed by a 60 minute question and answer period attended only by members of the examining committee. The questions relate to the student's presentation and also explore the student's familiarity with broader statistical topics related to the thesis research. The oral examination is normally completed during the last few months of the student's PhD period. The examining committee typically consists of four faculty members from the Statistics Department and a fifth faculty member from outside the department serving as the committee chair. Four out of five passing votes are required and no grades are given. Nearly all students can expect to pass this examination, although it is common for specific recommendations to be made regarding completion of the thesis.

The Dissertation Reading Committee must also read and approve the thesis.

For further information on university oral examinations and committees, please see the Graduate Academic Policies and Procedures (GAP) Handbook section 4.7 .

Dissertation

The dissertation is the capstone of the PhD degree. It is expected to be an original piece of work of publishable quality. The research advisor and two additional faculty members constitute the student's dissertation reading committee.

phd thesis data science

Analytics Insight

10 Best Research and Thesis Topic Ideas for Data Science in 2022

' src=

These research and thesis topics for data science will ensure more knowledge and skills for both students and scholars

  • Handling practical video analytics in a distributed cloud:  With increased dependency on the internet, sharing videos has become a mode of data and information exchange. The role of the implementation of the Internet of Things (IoT), telecom infrastructure, and operators is huge in generating insights from video analytics. In this perspective, several questions need to be answered, like the efficiency of the existing analytics systems, the changes about to take place if real-time analytics are integrated, and others.
  • Smart healthcare systems using big data analytics: Big data analytics plays a significant role in making healthcare more efficient, accessible, and cost-effective. Big data analytics enhances the operational efficiency of smart healthcare providers by providing real-time analytics. It enhances the capabilities of the intelligent systems by using short-span data-driven insights, but there are still distinct challenges that are yet to be addressed in this field.
  • Identifying fake news using real-time analytics:  The circulation of fake news has become a pressing issue in the modern era. The data gathered from social media networks might seem legit, but sometimes they are not. The sources that provide the data are unauthenticated most of the time, which makes it a crucial issue to be addressed.
  • TOP 10 DATA SCIENCE JOB SKILLS THAT WILL BE ON HIGH DEMAND IN 2022
  • TOP 10 DATA SCIENCE UNDERGRADUATE COURSES IN INDIA FOR 2022
  • TOP DATA SCIENCE PROJECTS TO DO DURING YOUR OMICRON QUARANTINE
  • Secure federated learning with real-world applications : Federated learning is a technique that trains an algorithm across multiple decentralized edge devices and servers. This technique can be adopted to build models locally, but if this technique can be deployed at scale or not, across multiple platforms with high-level security is still obscure.
  • Big data analytics and its impact on marketing strategy : The advent of data science and big data analytics has entirely redefined the marketing industry. It has helped enterprises by offering valuable insights into their existing and future customers. But several issues like the existence of surplus data, integrating complex data into customers’ journeys, and complete data privacy are some of the branches that are still untrodden and need immediate attention.
  • Impact of big data on business decision-making: Present studies signify that big data has transformed the way managers and business leaders make critical decisions concerning the growth and development of the business. It allows them to access objective data and analyse the market environments, enabling companies to adapt rapidly and make decisions faster. Working on this topic will help students understand the present market and business conditions and help them analyse new solutions.
  • Implementing big data to understand consumer behaviour : In understanding consumer behaviour, big data is used to analyse the data points depicting a consumer’s journey after buying a product. Data gives a clearer picture in understanding specific scenarios. This topic will help understand the problems that businesses face in utilizing the insights and develop new strategies in the future to generate more ROI.
  • Applications of big data to predict future demand and forecasting : Predictive analytics in data science has emerged as an integral part of decision-making and demand forecasting. Working on this topic will enable the students to determine the significance of the high-quality historical data analysis and the factors that drive higher demand in consumers.
  • The importance of data exploration over data analysis : Exploration enables a deeper understanding of the dataset, making it easier to navigate and use the data later. Intelligent analysts must understand and explore the differences between data exploration and analysis and use them according to specific needs to fulfill organizational requirements.
  • Data science and software engineering : Software engineering and development are a major part of data science. Skilled data professionals should learn and explore the possibilities of the various technical and software skills for performing critical AI and big data tasks.

phd thesis data science

Disclaimer: Any financial and crypto market information given on Analytics Insight are sponsored articles, written for informational purpose only and is not an investment advice. The readers are further advised that Crypto products and NFTs are unregulated and can be highly risky. There may be no regulatory recourse for any loss from such transactions. Conduct your own research by contacting financial experts before making any investment decisions. The decision to read hereinafter is purely a matter of choice and shall be construed as an express undertaking/guarantee in favour of Analytics Insight of being absolved from any/ all potential legal action, or enforceable claims. We do not represent nor own any cryptocurrency, any complaints, abuse or concerns with regards to the information provided shall be immediately informed here .

You May Also Like

cybersecurity company

Top 10 Most Trusted Cybersecurity Companies in the World

TMS Network

Ripple (XRP) price continues to drop after bearish market sentiment, Cardano (ADA) faces trouble after another bearish pattern emerges, Investing in TMS Network (TMSN) seems like the only sensible solution in this scenario

phd thesis data science

What Does Facial Recognition Tell HR?

Automation technology

Integrating Automation Technology to Transform Lab Work

phd thesis data science

Analytics Insight® is an influential platform dedicated to insights, trends, and opinion from the world of data-driven technologies. It monitors developments, recognition, and achievements made by Artificial Intelligence, Big Data and Analytics companies across the globe.

linkedin

  • Select Language:
  • Privacy Policy
  • Content Licensing
  • Terms & Conditions
  • Submit an Interview

Special Editions

  • Crypto-Magazine-Weekly-December-2023
  • 40 Under 40 Innovators
  • Women In Technology
  • Market Reports
  • AI Glossary
  • Infographics

Latest Issue

Crypto-Magazine-Weekly-December-2023

Disclaimer: Any financial and crypto market information given on Analytics Insight is written for informational purpose only and is not an investment advice. Conduct your own research by contacting financial experts before making any investment decisions, more information here .

Second Menu

phd thesis data science

phd thesis data science

Recent Dissertation Topics

Marty Wells and a student look over papers

Dan Kowal - "Bayesian Methods for Functional and Time Series Data"

Dissertation Advisor: David Matteson and David Ruppert

Initial job placement: assistant professor, Department of Statistics, Rice University

Keegan Kang - "Data Dependent Random Projections"

Dissertation Advisor: Giles Hooker

David Sinclair - "Model Selection Results for High Dimensional Graphical Models on Binary and Count Data with Applications to FMRI and Genomics"

Liu, yanning – "statistical issues in the design and analysis of clinical trials".

Dissertation Advisor: Bruce Turnbull

Nicholson, William Bertil – "Tools for Modeling Sparse Vector Autoregressions"

Dissertation Advisor: David Matteson

Tupper, Laura Lindley – "Topics in Classification and Clustering of High-Dimensional Data"

Chetelat, didier – "high-dimensional inference by unbiased risk estimation".

Dissertation Advisor: Martin Wells

Initial Job Placement: Assistant Professor Universite de Montreal, Montreal, Canada

Gaynanova, Irina – "Estimation Of Sparse Low-Dimensional Linear Projections"

Dissertation Advisor: James Booth

Initial Job Placement: Assistant Professor, Texas A&M, College Station, TX

Mentch, Lucas – "Ensemble Trees and CLTS: Statistical Inference in Machine Learning"

Initial Job Placement: Assistant Professor, University of Pittsburgh, Pittsburgh, PA

Risk, Ben – "Topics in Independent Component Analysis, Likelihood Component Analysis, and Spatiotemporal Mixed Modeling"

Dissertation Advisors: David Matteson and David Ruppert

Initial Job Placement: Postdoctoral Fellow, University of North Carolina, Chapel Hill, NC

Zhao, Yue – "Contributions to the Statistical Inference for the Semiparametric Elliptical Copula Model"

Disseration Advisor: Marten Wegkamp 

Initial Job Placement: Postoctoral Fellow, McGill University, Montreal, Canada

Chen, Maximillian Gene – "Dimension Reduction and Inferential Procedures for Images"

Dissertation Advisor: Martin Wells 

Earls, Cecelia – Bayesian hierarchical Gaussian process models for functional data analysis

Dissertation Advisor: Giles Hooker

Initial Job Placement: Lecturer, Cornell University, Ithaca, NY

Li, James Yi-Wei – "Tensor (Multidimensional Array) Decomposition, Regression, and Software for Statistics and Machine Learning"

Initial Job Placement: Research Scientist, Yahoo Labs

Schneider, Matthew John – "Three Papers on Time Series Forecasting and Data Privacy"

Dissertation Advisor: John Abowd

Initial Job Placement: Assistant Professor, Northwestern University, Evanston, IL

Thorbergsson, Leifur – "Experimental design for partially observed Markov decision processes"

Initial Job Placement: Data Scientist, Memorial Sloan Kettering Cancer Center, New York, NY

Wan, Muting – "Model-Based Classification with Applications to High-Dimensional Data in Bioinformatics"

Initial Job Placement: Senior Associate, 1010 Data, New York, NY

Johnson, Lynn Marie – "Topics in Linear Models: Methods for Clustered, Censored Data and Two-Stage Sampling Designs"

Dissertation Advisor: Robert Strawderman

Initial Job Placement: Statistical Consultant, Cornell, Statistical Consulting Unit, Ithaca, NY

Tecuapetla Gomez, Inder Rafael –  "Asymptotic Inference for Locally Stationary Processes"

Dissertation Advisor: Michael Nussbaum

Initial Job Placement: Postdoctoral Fellow, Georg-August-Universitat Gottigen, Gottigen, Germany. 

Bar, Haim – "Parallel Testing, and Variable Selection -- a Mixture-Model Approach with Applications in Biostatistics" 

Dissertation Advisor: James Booth

Initial Job Placement: Postdoc, Department of Medicine, Weill Medical Center, New York, NY

Cunningham, Caitlin –  "Markov Methods for Identifying ChIP-seq Peaks" 

Initial Job Placement: Assistant Professor, Le Moyne College, Syracuse, NY

Ji, Pengsheng – "Selected Topics in Nonparametric Testing and Variable Selection for High Dimensional Data" 

Dissertation Advisor: Michael Nussbaum 

Initial Job Placement: Assistant Professor, University of Georgia, Athens, GA

Morris, Darcy Steeg – "Methods for Multivariate Longitudinal Count and Duration Models with Applications in Economics" 

Dissertation Advisor: Francesca Molinari 

Initial Job Placement: Research Mathematical Statistician, Center for Statistical Research and Methodology, U.S. Census Bureau, Washington DC

Narayanan, Rajendran – "Shrinkage Estimation for Penalised Regression, Loss Estimation and Topics on Largest Eigenvalue Distributions" 

Initial Job Placement: Visiting Scientist, Indian Statistical Institute, Kolkata, India

Xiao, Luo – "Topics in Bivariate Spline Smoothing" 

Dissertation Advisor: David Ruppert 

Initial Job Placement: Postdoc, Johns Hopkins University, Baltimore, MD

Zeber, David – "Extremal Properties of Markov Chains and the Conditional Extreme Value Model" 

Dissertation Advisor: Sidney Resnick 

Initial Job Placement: Data Analyst, Mozilla, San Francisco, CA

Clement, David – "Estimating equation methods for longitudinal and survival data" 

Dissertation Advisor: Robert Strawderman 

Initial Job Placement: Quantitative Analyst, Smartodds, London UK

Eilertson, Kirsten – "Estimation and inference of random effect models with applications to population genetics and proteomics" 

Dissertation Advisor: Carlos Bustamante 

Initial Job Placement: Biostatistician, The J. David Gladstone Institutes, San Francisco CA

Grabchak, Michael – "Tempered stable distributions: properties and extensions" 

Dissertation Advisor: Gennady Samorodnitsky 

Initial Job Placement: Assistant Professor, UNC Charlotte, Charlotte NC

Li, Yingxing – "Aspects of penalized splines" 

Initial Job Placement: Assistant Professor, The Wang Yanan Institute for Studies in Economics, Xiamen University

Lopez Oliveros, Luis – "Modeling end-user behavior in data networks" 

Dissertation Advisor: Sidney Resnick  

Initial Job Placement: Consultant, Murex North America, New York NY

Ma, Xin – "Statistical Methods for Genome Variant Calling and Population Genetic Inference from Next-Generation Sequencing Data" 

Initial Job Placement: Postdoc, Stanford University, Stanford CA

Kormaksson, Matthias – "Dynamic path analysis and model based clustering of microarray data" 

Dissertation Advisor: James Booth 

Initial Job Placement: Postdoc, Department of Public Health, Weill Cornell Medical College, New York NY

Schifano, Elizabeth – "Topics in penalized estimation" 

Initial Job Placement: Postdoc, Department of Biostatistics, Harvard University, Boston MA

Hanlon, Bret – "High-dimensional data analysis" 

Dissertation Advisor: Anand Vidyashankar 

Shaby, Benjamin – "Tools for hard bayesian computations" 

Initial Job Placement: Postdoc, SAMSI, Durham NC

Zipunnikov, Vadim – "Topics on generalized linear mixed models" 

Initial Job Placement: Postdoc, Department of Biostatistics, Johns Hopkins University, Baltimore MD

Barger, Kathryn Jo-Anne – "Objective bayesian estimation for the number of classes in a population using Jeffreys and reference priors" 

Dissertation Advisor: John Bunge 

Initial Job Placement: Pfizer Incorporated

Chan, Serena Suewei – "Robust and efficient inference for linear mixed models using skew-normal distributions" 

Initial Job Placement: Statistician, Takeda Pharmaceuticles, Deerfield IL

Lin, Haizhi – "Distressed debt prices and recovery rate estimation" 

Dissertation Advisor: Martin Wells  

Initial Job Placement: Associate, Fixed Income Department, Credit Suisse Securities (USA), New York, NY

Harvard University Theses, Dissertations, and Prize Papers

The Harvard University Archives ’ collection of theses, dissertations, and prize papers document the wide range of academic research undertaken by Harvard students over the course of the University’s history.

Beyond their value as pieces of original research, these collections document the history of American higher education, chronicling both the growth of Harvard as a major research institution as well as the development of numerous academic fields. They are also an important source of biographical information, offering insight into the academic careers of the authors.

Printed list of works awarded the Bowdoin prize in 1889-1890.

Spanning from the ‘theses and quaestiones’ of the 17th and 18th centuries to the current yearly output of student research, they include both the first Harvard Ph.D. dissertation (by William Byerly, Ph.D . 1873) and the dissertation of the first woman to earn a doctorate from Harvard ( Lorna Myrtle Hodgkinson , Ed.D. 1922).

Other highlights include:

  • The collection of Mathematical theses, 1782-1839
  • The 1895 Ph.D. dissertation of W.E.B. Du Bois, The suppression of the African slave trade in the United States, 1638-1871
  • Ph.D. dissertations of astronomer Cecilia Payne-Gaposchkin (Ph.D. 1925) and physicist John Hasbrouck Van Vleck (Ph.D. 1922)
  • Undergraduate honors theses of novelist John Updike (A.B. 1954), filmmaker Terrence Malick (A.B. 1966),  and U.S. poet laureate Tracy Smith (A.B. 1994)
  • Undergraduate prize papers and dissertations of philosophers Ralph Waldo Emerson (A.B. 1821), George Santayana (Ph.D. 1889), and W.V. Quine (Ph.D. 1932)
  • Undergraduate honors theses of U.S. President John F. Kennedy (A.B. 1940) and Chief Justice John Roberts (A.B. 1976)

What does a prize-winning thesis look like?

If you're a Harvard undergraduate writing your own thesis, it can be helpful to review recent prize-winning theses. The Harvard University Archives has made available for digital lending all of the Thomas Hoopes Prize winners from the 2019-2021 academic years.

Accessing These Materials

How to access materials at the Harvard University Archives

How to find and request dissertations, in person or virtually

How to find and request undergraduate honors theses

How to find and request Thomas Temple Hoopes Prize papers

How to find and request Bowdoin Prize papers

  • email: Email
  • Phone number 617-495-2461

Related Collections

Harvard faculty personal and professional archives, harvard student life collections: arts, sports, politics and social life, access materials at the harvard university archives.

University of Pittsburgh logo

  • 2021 Update
  • 2020 Update
  • Reputation and History
  • Departments and Programs
  • Faculty Recruiting
  • Zoom Backgrounds
  • Board of Visitors
  • SCI Learning Academy
  • Administration
  • Faculty Directory
  • Staff Directory
  • PhD Students
  • Diversity, Equity and Inclusion at SCI
  • Diversity, Equity and Inclusion Committee
  • School Initiatives and Resources
  • University Initiatives and Resources
  • Carving the path to safer and smarter buildings
  • A holistic approach to intelligent social learning
  • How to anticipate hiccups in health care
  • Preserving a shared digital memory
  • Holding information technologies accountable and addressing misinformation on the web
  • More than an afterthought: Dr. Ibrahim shows students the necessity of cybersecurity
  • Current Grants
  • Faculty Accepting Undergraduate Students for Research
  • Submit Research for Undergraduate Students
  • Labs, Centers, and Institutes
  • Visiting Scholars
  • Undergraduate Research Scholars
  • Degrees and Programs
  • Find the Right Major for You
  • Computational Biology
  • Computational Social Science
  • Computer Science
  • Data Science
  • Digital Narrative and Interactive Design
  • Information Science
  • BS + MS in Computer Science
  • Physics and Quantum Computing
  • Library and Information Science
  • Intelligent Systems
  • Telecommunications
  • Computational Modeling and Simulation
  • Information Science with a focus in Telecommunications
  • Applied Data Driven Methods
  • Big Data Analytics
  • Cybersecurity, Policy, and Law
  • Information and Network Security
  • Professional Institute
  • Types of Opportunities
  • Experiential Learning Courses
  • Meet Alexa Spaventa
  • Meet J. Stephanie Rose
  • Meet Lydon Pelletier
  • Meet Pedro Bustamante
  • Meet Nico Campuzano
  • Meet Andrea Michael
  • Meet Kinori Rosnow
  • Take the Next Step
  • Undergraduate Admissions FAQ
  • Master's Admissions
  • Doctoral Admissions
  • Certificate Admissions
  • GRE Requirements
  • Financial Aid
  • Scholarships
  • Campus Life
  • Information Sessions
  • A-Z Student Resources
  • Responsibilities
  • Placement Assessments
  • General Education Requirements
  • Major and Minor Declaration
  • Faculty Mentors
  • Contact the SCI Advising Center
  • Building Hours
  • Career Resources
  • Post-Graduate Outcomes
  • Enrollment Resources
  • Graduation Process and Expectations
  • Apply for Graduation
  • School Recognition Ceremony
  • Information Technology
  • Graduate Student Orientation
  • New Graduate Student FAQ
  • Undergraduate Student Orientation
  • Ombudsperson
  • Academic Integrity Policy
  • Experiential Learning Policies
  • School Forms
  • Student Appeals
  • Student Organizations
  • Academic Support and Tutoring
  • Student Success Workshops
  • Who to Contact
  • Submit a News Item
  • Event Assistance & Promotion
  • Master's Degrees

Master of Data Science

Become a data scientist at your own pace, no stem background required.

Jump to: Admissions | Curriculum | Tuition & Financing | Career Outcomes  | In the News

Data is now produced, captured, and published more than ever, transforming science, business, health care, industry, and more. It is vital to have professionals who know how to analyze this data.

Advance or start your career in data science with the University of Pittsburgh’s fully online Master of Data Science (MDS) program, in partnership with Coursera, and become a part of this transformation through data.  

In this program, you will:

  •  Develop a deep understanding of core computational, mathematical and statistical concepts, responsible data management , and data curation skills (ethically cleaning, interpreting, and using big data across a variety of contexts).
  • Learn to program for data analysis in Python and R, using Jupyter notebooks, and RStudio. Design and query relational databases using MySQL via MySQL Workbench, and design and query graph databases with Neo4j.
  • Access and work with large, real-world data sources from campus, community and corporate partners.
  • Gain practical experience through data exploration and predictive modeling , and develop an ethical toolkit for informed and defensible decision-making.

The 30-credit program is uniquely designed to unlock the ethical use of data and teach students how to explore the social impact of data science and craft their professional path to becoming responsible data scientists.  Students will learn to choose the right methods, validate findings, and make ethically informed decisions.

Created for working professionals and adult learners, the program was designed to remove barriers preventing entry to the program:

  • Courses are asynchronous , so adult learners have the freedom and flexibility to pace themselves and customize their learning schedule. 
  • No prior computational or programming experience is required , as you will learn the necessary foundational skills in the program.
  • Admission to the program is determined by your performance in a 3-credit course and bachelor’s degree verification .

Once accepted, you will join a cutting-edge online program offered by one of the pioneering computer science schools in the United States and an R-1 university . 

Admissions >>

Pitt’s MDS program was conceived to remove unnecessary barriers to enter the program. Therefore, no resume, transcripts, essays or letters of recommendations are required, only a verification of an earned bachelor’s degree from an accredited U.S. university or its equivalent.

To gain admission to the MDS program, you’ll complete a 3-credit pathway course on the Coursera platform. By attaining a B average or better in that course, you’ll be accepted into the degree program upon bachelor’s degree verification.

Learn more about our admissions process  here .

Curriculum >>

Our program focuses on building responsible theoretical and practical skills to help you enter or advance your career as a data scientist in our data-fueled economy. Offered fully online, the 30-credit program (10 courses, 3 credits each, including a capstone) will usually take 20-36 months to complete.

Earning your master’s in data science will enable you to:

  • Acquire computational, mathematical, and statistical knowledge, as well as responsible data management, and data curation skills (ethically cleaning, interpreting, and using big data across a variety of contexts).
  • Program in Python and R, using Jupyter notebooks and RStudio.
  • Design and query relational databases using various tools (e.g. MySQL via MySQL Workbench and Neo4j).
  • Gain practical experience through data exploration and predictive modeling.
  • Develop hands-on experience with large, real-world data sources from campus, community and corporate partners.

This program is delivered by the same faculty teaching on-campus. Course instructors hold live office hours when you can ask questions about the material being covered in the program or any doubts you may have. Additional opportunities for dialogue and connection happen through discussion boards and group sessions with peers.

Learn more about the curriculum and each course here . 

Tuition & Financing >>

 This program is affordably priced to accommodate the budget of working professionals. The total tuition of the 30-credit MDS degree is around $15,000. The roughly $500-per-credit-hour cost is a fraction of the cost of most on-campus programs and most online MDS programs. This tuition rate applies to all students, regardless of their state or country of residency.

Additionally, you can pay only for the courses you are enrolled in rather than committing to the entire degree.

Learn more about financial options here .

Career Outcomes >> 

The U.S. Bureau of Labor Statistics expects the number of data scientist jobs to grow by 35% between 2022 and 2032 . U.S. News and World Report ranked data scientist #6 in Best Technology Jobs, #11 in Best STEM Jobs, and #22 in 100 Best Jobs. By completing the MDS, you can join this rising field.

Graduates from our program will be able to perform big data management, apply machine learning techniques, design database modeling, present data visualization, and communicate data insights. Potential careers include Data Scientist, Data Analyst, Data Coordinator, Data Engineer, and Analysts in various sectors such as business, healthcare, supply chain, market research, sales, recruiting, finance, and data governance and compliance.

Students who complete this program will also have access to Pitt career services which include a job opportunity board, resume writing and cover letter support, interview preparation assistance, salary negotiation training, and access to the Pitt Alumni Association.

Learn more about career opportunities in data science here .

The MDS in the News

Casual, No Pressure: 10 Best Ways for Adult Learners to Upskill or Reskill

17 Most Expensive College Towns in the US

Why Pitt and Coursera are Launching an Online Master’s Degree Program in Data Science

An Affordable Master's Program at Pitt

University of Pittsburgh's online master's in data science program targets those without STEM backgrounds

University of Pittsburgh Launches $15K Master of Data Science on Coursera — No STEM Background or Application Required

University Of Pittsburgh To Offer A $15,000 Master Of Data Science On Coursera

Pitt is Making Data Science Accessible Worldwide with a New Online Master’s Degree

IMAGES

  1. PhD degree timeline

    phd thesis data science

  2. Thesis structure (Developed by the author)

    phd thesis data science

  3. phd thesis presentation

    phd thesis data science

  4. Phd Thesis Synopsis sample

    phd thesis data science

  5. (PDF) PhD Thesis (summary)

    phd thesis data science

  6. (PDF) Ph.D. Thesis Computer Science & Engg

    phd thesis data science

VIDEO

  1. Mastering Research: Choosing a Winning Dissertation or Thesis Topic

  2. ## PhD thesis writing methods off the social science

  3. Excel file

  4. #How to Choose Research Topic for PhD #Resourses for Research Topic

  5. [PhD Thesis Defense] Charting the Landscapes of Ventral Neural Code on Generative Image Manifolds

  6. PhD Thesis Defense. Nikita Akhmetov

COMMENTS

  1. Computational and Data Sciences (PhD) Dissertations

    Computational and Data Sciences (PhD) Dissertations Below is a selection of dissertations from the Doctor of Philosophy in Computational and Data Sciences program in Schmid College that have been included in Chapman University Digital Commons.

  2. Doctor of Data Science and Analytics Dissertations

    The Ph.D. in Data Science and Analytics is an advanced degree with a dual focus of application and research - where students will engage in real world business problems, which will inform and guide their research interests. We launched the first formal PhD program in Data Science in 2015.

  3. Getting a PhD in Data Science: What You Need to Know

    A Doctor of Philosophy (PhD) is the terminal degree in the field of data science, meaning it is the highest possible degree that can be obtained in the subject. Holding a PhD in data science, consequently, signals your mastery and knowledge of the field to both potential employers and fellow professionals.

  4. 10 Compelling Machine Learning Ph.D. Dissertations for 2020

    This dissertation describes a general communication-efficient algorithm for distributed statistical learning on this type of big data. The algorithm distributes the samples uniformly to multiple machines, and uses a common reference data to improve the performance of local estimates.

  5. PhD Dissertations

    PhD Dissertations [All are .pdf files] Probabilistic Reinforcement Learning: Using Data to Define Desired Outcomes, and Inferring How to Get There Benjamin Eysenbach, 2023. Data-driven Decisions - An Anomaly Detection Perspective Shubhranshu Shekhar, 2023. METHODS AND APPLICATIONS OF EXPLAINABLE MACHINE LEARNING Joon Sik Kim, 2023. Applied Mathematics of the Future Kin G. Olivares, 2023

  6. PhD in Data Science

    PhD in Data Science. Run by the Committee on Data Science, the PhD curriculum combines training in mathematical foundations of data science, responsible data use and communication, and advanced computational methods. Conduct research on cutting edge problems and explore the emerging field of Data Science alongside preeminent faculty at UChicago.

  7. PhD in Data Science

    A PhD is the most advanced data science degree you can get, reflecting a depth of knowledge and technical expertise that will put you at the top of your field. This means that PhD programs are the most time-intensive degree option out there, typically requiring that students complete dissertations involving rigorous research.

  8. PhD in Data Science

    The CDS PhD program model rigorously trains data scientists of the future who (1) develop methodology and harness statistical tools to find answers to questions that transcend the boundaries of traditional academic disciplines; (2) clearly communicate to extract crisp questions from big, heterogeneous, uncertain data; (3) effectively translate f...

  9. PhD Program

    A dissertation in the scope of Data Science is required of every candidate for the PhD degree. HDSI PhD program thesis requirements must meet Regulation 715(D) requirements. The final form of the dissertation document must comply with published guidelines by the Graduate Division. The dissertation topic will be selected by the student, under ...

  10. 17 Compelling Machine Learning Ph.D. Dissertations

    The first part of the thesis studies and develops the algebraic theory of tensors. The second part of the thesis presents three algorithms for tensor data. The algorithms use algebraic and geometric structure to give guarantees of optimality. 3. Statistical approaches for spatial prediction and anomaly detection

  11. Five Tips For Writing A Great Data Science Thesis

    Five Tips For Writing A Great Data Science Thesis Write for your reader, not for yourself Wouter van Heeswijk, PhD · Follow Published in Towards Data Science · 6 min read · Jun 20, 2022 -- 1 A good thesis always focuses on the reader. Learn which principles are necessary. Photo by Green Chameleon on Unsplash

  12. How to write a great data science thesis

    Towards Data Science · 6 min read · Jul 27, 2021 There are probably more than a thousand manuals on how to write a great thesis (some of my favorites can be found here, here and here ). They will stress the importance of structure, substance and style.

  13. Doctor of Philosophy in Data Science

    A Ph.D. in Data Science from the University of Virginia opens career paths in academia, industry or government. Graduates of our program will: Understand data as a generic concept, and how data encodes and captures information. Be fluent in modern data engineering techniques, and work with complex and large data sets.

  14. PhD in Data Science

    PhD in Data Science The PhD curriculum combines the aspiration to train all students in mathematical foundations of data science, responsible data use and communication, and advanced computational methods, with an appreciation of the diverse research interests of the data science faculty. First Year Requirements

  15. Ph.D. in Computational and Data Sciences

    Ph.D. in Computational and Data Sciences. Computational Science is the art of creating, developing, and validating models in order to gain a profound understanding of real-life complex problems. Data Science is the art of generating insight, knowledge, and predictions by applying modern methods to large datasets.

  16. PhD in Computing & Data Sciences

    The PhD program in Computing & Data Sciences (CDS) at Boston University prepares its graduates to make significant contributions to the art, science, and engineering of computational and data-driven processes that are woven into all aspects of society, economy, and public discourse, leading to solution of problems and synthesis of knowledge related to the methodical, generalizable, and ...

  17. PhD in Physics, Statistics, and Data Science » MIT Physics

    The Interdisciplinary Doctoral Program in Statistics (IDPS) is designed to provide students with the highest level of competency in 21st century statistics, enabling doctoral students across MIT to better integrate computation and data analysis into their PhD thesis research. Admission to this program is restricted to students currently ...

  18. PhD in Data Science

    Graduate Admissions. WPI's PhD in data science is interdisciplinary, drawing from Computer Science, Mathematical Sciences, and the Business School. Together, courses and dissertation research revolve around five key areas: Integrative Data Science. Business Intelligence and Case Studies. Data Access and Management. Data Analytics and Mining.

  19. PhD

    The Doctor of Philosophy program in the Field of Statistics is intended to prepare students for a career in research and teaching at the University level or in equivalent positions in industry or government. A PhD degree requires writing and defending a dissertation. Students graduate this program with a broad set of skills, from the ability to ...

  20. Doctoral Program

    The Dissertation Reading Committee must also read and approve the thesis. For further information on university oral examinations and committees, please see the Graduate Academic Policies and Procedures (GAP) Handbook section 4.7. Dissertation. The dissertation is the capstone of the PhD degree.

  21. 10 Best Research and Thesis Topic Ideas for Data Science in 2022

    In this article, we have listed 10 such research and thesis topic ideas to take up as data science projects in 2022. Handling practical video analytics in a distributed cloud: With increased dependency on the internet, sharing videos has become a mode of data and information exchange.

  22. Recent Dissertation Topics

    2015. 2014. 2013. 2012. 2011. 2010. 2009. 2008. This list of recent dissertation topics shows the range of research areas that our students are working on.

  23. Harvard University Theses, Dissertations, and Prize Papers

    Ph.D. dissertations of astronomer Cecilia Payne-Gaposchkin (Ph.D. 1925) and physicist John Hasbrouck Van Vleck (Ph.D. 1922) Undergraduate honors theses of novelist John Updike (A.B. 1954), filmmaker Terrence Malick (A.B. 1966), and U.S. poet laureate Tracy Smith (A.B. 1994)

  24. Philosophy and Data Science

    It is the study of knowledge itself. This ties very nicely with data science, since we are trying to gain knowledge from data! Epistemology is the study of what we can know and how we can know it. It is the study of knowledge itself. Here is what we will cover: Inductive vs. deductive reasoning; Skepticism; Pragmatism; Inductive vs. deductive ...

  25. Master of Data Science

    Our program focuses on building responsible theoretical and practical skills to help you enter or advance your career as a data scientist in our data-fueled economy. Offered fully online, the 30-credit program (10 courses, 3 credits each, including a capstone) will usually take 20-36 months to complete. Earning your master's in data science ...