Zheng Yuan | Ph.D.

alt text 

Zheng Yuan, Ph.D.

[Curriculem Vitae]

About me

I am a fifth year PhD student in Statistics at Northwestern University. Prior to joining Northwestern, I received my Master's degree in Statistical Science at Duke University advised by Prof. Peter Hoff and Prof. Merlise A. Clyde.

I am broadly interested in statistical inference and statistical learning, with specific interests in developing methods which leverage structure in the data or feature information to yield better prediction and inference. I am also really interested in understanding how statistical inference and statistical learning can have a great impact on decision making in capital markets. For example, advanced statistical techniques can be applied to solve complex trading problems and therefore change the markets. Meanwhile, I have a passion in studying how statistics is communicated to the general public and in the classroom as well.

Education

  • Ph.D. in Statistics, Northwestern University, Sep. 2020 - present

  • M.S. in Statistical Science, Duke University, Aug. 2018 - May. 2020

  • B.S. in Statistics, Nankai University, Sep. 2014 - Jun. 2018

Select Machine Learning and Data Science Projects

Deep Learning

Credit Card Fraud Detection Predictive Models
  • Highly unbalanced dataset with 492 frauds out of 284,807 transactions

  • Perform data cleaning and exploratoty data analysis

  • Exploit certain performance metrics and resampling method to handle imbalanced data

  • Apply Random Forest, Naive Baysian, AdaBoost, XGBoost, LightGBM and ANN classifiers

  • Calculate the ROC-AUC score for each classifier and compare

House Prices Prediction Using TensorFlow-Based Decision Forests
  • Dataset contains 1460 entries of 79 feature variables covering different aspects of residential homes in Ames, Iowa

  • Perform data cleaning and exploratoty data analysis

  • Train Random Forest, Gradient Boosted Trees and Distributed Gradient Boosted Trees

  • Calculate the Root Mean Squared Error (RMSE) from cross-validation for each algorithm and compare

Bayesian

How many yards will an NFL player gain after receiving a handoff?
  • Dataset contains game, play, and player-level data, including the position and speed of players

  • Build Random Forest and ANN models to predict how many yards a team will gain on given rushing plays

  • Models are tested by Continuous Ranked Probability Score (CRPS)

  • Use Bayesian optimization to tune the hyperparameters with the mean cross-validation score as optimization object

Contact

ZhengYuan2025@u.northwestern.edu
2006 Sheridan Rd,
Evanston, IL 60208