Zheng Yuan | Ph.D.
About me
I am a fifth year PhD student in Statistics at Northwestern University. Prior to joining Northwestern, I received my Master's degree in Statistical Science at Duke University advised by Prof. Peter Hoff and Prof. Merlise A. Clyde.
I am broadly interested in statistical inference and statistical learning, with specific interests in developing methods which leverage structure in the data or feature information to yield better prediction and inference. I am also really interested in understanding how statistical inference and statistical learning can have a great impact on decision making in capital markets. For example, advanced statistical techniques can be applied to solve complex trading problems and therefore change the markets. Meanwhile, I have a passion in studying how statistics is communicated to the general public and in the classroom as well.
Education
Ph.D. in Statistics, Northwestern University, Sep. 2020 - present
M.S. in Statistical Science, Duke University, Aug. 2018 - May. 2020
B.S. in Statistics, Nankai University, Sep. 2014 - Jun. 2018
Select Machine Learning and Data Science Projects
Deep Learning
Credit Card Fraud Detection Predictive Models
Highly unbalanced dataset with 492 frauds out of 284,807 transactions
Perform data cleaning and exploratoty data analysis
Exploit certain performance metrics and resampling method to handle imbalanced data
Apply Random Forest, Naive Baysian, AdaBoost, XGBoost, LightGBM and ANN classifiers
Calculate the ROC-AUC score for each classifier and compare
House Prices Prediction Using TensorFlow-Based Decision Forests
Dataset contains 1460 entries of 79 feature variables covering different aspects of residential homes in Ames, Iowa
Perform data cleaning and exploratoty data analysis
Train Random Forest, Gradient Boosted Trees and Distributed Gradient Boosted Trees
Calculate the Root Mean Squared Error (RMSE) from cross-validation for each algorithm and compare
Bayesian
How many yards will an NFL player gain after receiving a handoff?
Dataset contains game, play, and player-level data, including the position and speed of players
Build Random Forest and ANN models to predict how many yards a team will gain on given rushing plays
Models are tested by Continuous Ranked Probability Score (CRPS)
Use Bayesian optimization to tune the hyperparameters with the mean cross-validation score as optimization object
Contact
ZhengYuan2025@u.northwestern.edu
2006 Sheridan Rd,
Evanston, IL 60208
|