Mia S.
Dagati
Data Scientist & Research Assistant
Michigan State University · Expected Aug 2026
First Place — MSU Undergraduate Research & Arts Forum, Animal Science & Agriculture (2024)
I'm a dual-degree student at Michigan State University completing a B.S. in Statistics and a B.S. in Data Science (CMSE), expected August 2026. I build reproducible, policy-ready data systems at the intersection of applied statistics, data engineering, and environmental sustainability.
Over four years of research I've contributed to a $1.2M IoT-based irrigation optimization study in Michigan apple orchards, built a national-scale ETL and classification pipeline mapping embodied material stocks across utility-scale solar arrays, and published a peer-reviewed article in AgriEngineering. I've presented at ASABE, ISSST 2026, and multiple MSU forums, with an IEEE PVSC 2026 submission pending, and have served as an undergraduate learning assistant in MSU's statistics department.
I work primarily in R and Python, with experience in geospatial data integration, ETL pipelines, time-series analysis, and machine learning. I hold a CyberAmbassador certification from the NSF and am active in ASABE, the MSU AI Club, Women in Computing, and the MSU Powerlifting Team.
I grew up in Ontario, Canada until I was 11, then my family moved to Macomb County, Michigan, about 30 minutes north of Detroit, where I completed middle and high school. I'm the first in my family to go away to university. My mom commuted to a local college, and my dad never went to college at all, so leaving home was a big deal for all of us. I chose Michigan State because I wanted to stay close enough to have the opportunity to come home on weekends, and because MSU's research opportunities were genuinely hard to pass up. Both turned out to be the right call. My dad was born in Italy and both sets of grandparents were born there too, so getting to visit there has meant a lot to me personally. Most of what I love outside of research traces back to the same thing: being outside and caring about the world I'm a part of.
Getting outside is the thing I look forward to most. There is something about the beauty and vastness of nature that never gets old for me. Acadia is one that has stayed with me, the fall foliage, the ocean, the quiet of it. The Grand Canyon and Zion are hard to top for sheer scale, but some of my favorite days have been on trails closer to home. I did a 20-mile backpacking trip on the North Country Trail near Manistee, Michigan that I think about often. Ireland surprised me completely. Standing on the Cliffs of Moher felt like being at the edge of the world.
Italy twice, France, Spain, Germany, Czech Republic, Ireland, and a handful of Caribbean islands. My dad was born in Italy and both sets of grandparents were born there, so those trips have been personal in a way that is hard to put into words. I collect old books wherever I go. My oldest is a French set from 1763, over 262 years old and still intact.
I have been lifting for six years and got into powerlifting specifically my sophomore year of college. Competing with the MSU Powerlifting Team. It helped develop the discipline I carry into every other part of my life today.
Work Experience
– Present
- Designed and implemented a multi-source data integration pipeline in R, joining geospatial array metadata, manufacturer datasheets, and spatiotemporal records into a unified, reproducible analytical dataset using tidyverse, sf, and tigris workflows.
- Developed and applied a rule-based classification framework to infer First Solar CdTe module series from installation year and geometric metadata, incorporating confidence scoring through cross-validation against Landsat-derived estimates.
- Engineered a tiered module count estimation approach combining panel-level geometry, array-level geometry, and manufacturer specification sheet fallbacks to produce array-level material inventory estimates at national scale.
- Prepared and delivered research findings to faculty, collaborators, and external stakeholders through written reports, presentations, and conference submissions including IEEE PVSC 2026 and ISSST 2026.
- Compiled a first-of-its-kind per-module material intensity reference table for First Solar CdTe Series 2–7, aggregating CdTe mass, front/back glass thickness, aluminum frame, steel back rail, EVA encapsulant, and copper leadwire values from manufacturer spec sheets, life cycle inventory (LCI) reports, and First Solar user manuals. LCI per-m² values were converted to per-module using series-specific module areas and cross-validated against known datasheet specifications.
– Dec 2025
- Assisted in leading formal classroom instruction twice weekly for a 60-student course, delivering student-centered lessons on R programming, reproducible coding workflows, and data analysis techniques in a high-expectation learning environment.
- Held weekly office hours providing individual and small-group tutoring, offering personalized academic feedback and hands-on R coding support to help students work through assignments and build technical confidence.
- Graded student homework submissions with a focus on code quality, reproducibility, and workflow structure, providing detailed written feedback to support iterative improvement and long-term learning.
- Guided students through foundational machine learning implementation and statistical analysis using R and R Markdown, emphasizing clean, documented, and reproducible analytical pipelines.
– Aug 2024
- Continued METER TEROS 12 IoT sensor data collection and time-series analysis across two MSU agricultural research farms over the summer, maintaining the sensor network and processing incoming soil moisture, temperature, and EC data.
- Developed the Ubidots farmer-facing dashboard further, surfacing real-time irrigation recommendations based on ET calculations and soil moisture thresholds in a simple, accessible format requiring no technical interpretation from the farmer.
- Managed day-to-day field operations and personnel scheduling across both research farm locations throughout the summer research period.
- Presented research progress and findings to engineering faculty and project stakeholders, communicating technical results clearly to both technical and non-technical audiences.
– Jan 2025
- Contributed to a two-year, $1.2M green agriculture research study evaluating irrigation methods and scheduling to minimize water consumption, prevent soil degradation, and optimize crop yield in Michigan apple orchards, with work directly aligned with sustainable land and resource management.
- Designed and executed deployments of METER TEROS 12 IoT sensors across multiple MSU agricultural research farms, integrating time-series weather data to calculate crop evapotranspiration values and building a farmer-facing Ubidots dashboard delivering real-time soil moisture status and plain-language irrigation recommendations.
- Conducted a controlled laboratory study estimating nitrate dynamics in sandy soil using METER TEROS 12 electrical conductivity sensors, establishing a statistically significant positive correlation between nitrate concentration and EC and demonstrating measurable downward nitrate transport through soil profiles using a custom lysimeter setup; findings were presented at the 2024 ASABE Annual International Meeting in Anaheim, CA.
- Managed day-to-day operations across two MSU agricultural research farms, creating schedules and assigning tasks to keep project personnel and field operations running efficiently across the full duration of the study.
- Authored technical reports and grant proposals; supervised and mentored undergraduate student researchers across multiple project phases, providing training in data collection, statistical analysis, and scientific communication.
Education
Relevant Coursework: Probability & Statistics I–II, Bayesian Statistical Methods, Statistics for Biologists, Introduction to Data Science, Computational Modeling & Data Analysis I–II, Fundamentals of Data Science Methods, Matrix Algebra, Differential Equations, Multivariable Calculus
AP Coursework: Calculus BC, Statistics, Biology, Environmental Science, Microeconomics, U.S. History, World History, English Literature
Publications
Presentations
Technical Skills
Programming
- R (fluent)
- Python
- SQL
- MATLAB
- CAD
Data & Pipeline Tools
- pandas, NumPy
- tidyverse, ggplot2
- Git / GitHub
- Jupyter, R Markdown
Modeling & ML
- OLS / Logistic Regression
- LASSO, Ridge, GBM
- XGBoost, Random Forest
- SVM, GAMs
Data Engineering
- ETL Pipeline Development
- Feature Engineering
- Time-Series Processing
- Geospatial Data Integration
Soft Skills
Communication
- Technical writing
- Stakeholder presentations
- Cross-functional communication
- Scientific communication
Leadership & Mentorship
- Undergraduate mentorship
- Research team supervision
- Instructional facilitation
- Personnel management
Project Management
- Multi-site field operations
- Grant & proposal writing
- Organizational planning
- Long-horizon study design
Research Practice
- Reproducible workflows
- Data collection protocols
- Peer review & publication
- Conference presentation
Certifications
A professional skills certification in Communication, Teamwork, and Leadership for STEM professionals, designed to help scientists and engineers work more effectively across disciplines and with non-technical audiences. Hosted at Michigan State University and funded by the National Science Foundation.
A structured training program equipping graduate students and researchers with evidence-based mentoring practices for supervising undergraduate researchers. Developed by the National Research Mentor Network, CIMER, and the Tau Beta Pi Association.
An industry-recognized certification validating foundational SQL skills including querying, filtering, aggregation, and joins — administered through HackerRank's standardized assessment platform.
Research & Projects
Solar Material Inventory Mapping
ResearchDeveloped a data integration and inference framework classifying utility-scale CdTe solar arrays by First Solar module series across the contiguous United States. Built a national-scale ETL pipeline joining GM-SEUS geospatial metadata, manufacturer datasheets, and spatiotemporal records to produce validated, policy-ready material inventory outputs with direct applications to solar recycling infrastructure and circular economy planning.
IoT-Enabled Irrigation Optimization
ResearchContributed to a two-year, $1.2M study evaluating irrigation methods and scheduling to minimize water consumption, prevent soil degradation, and optimize crop yield in Michigan apple orchards. Designed and executed IoT sensor deployments across MSU research farms using METER TEROS 12 sensors, collecting soil moisture, temperature, and electrical conductivity readings every 5 minutes. Sensors were connected to a custom-designed breadboard powered by solar panels, logged locally on an Arduino Uno, and streamed to Ubidots via local WiFi.
Integrated time-series weather data to calculate crop evapotranspiration (ET) values, which were used alongside soil moisture readings to determine whether irrigation was needed and how long to run it. This fed into a simple farmer-facing Ubidots widget that displayed a direct recommendation — "Yes, water for X minutes" or "No irrigation needed" — removing the need for any technical interpretation on the farmer's end. The dashboard also displayed real-time soil moisture status (on-target, above, or below field capacity) and EC values for fertilizer management.
First Solar CdTe Per-Module Material Intensity Reference Table
ResearchCompiled a first-of-its-kind per-module material intensity reference table for First Solar CdTe Series 2 through 7, a dataset that had never been unified in this format before. Industry sources typically report material values per m², making direct per-module comparison across series difficult. This table aggregates module size, total weight, glass thickness (front and back), CdTe mass, aluminum frame, steel back rail, EVA encapsulant, copper leadwire, and other materials across all five series using manufacturer spec sheets, life cycle inventory (LCI) reports scraped from DOE databases, and First Solar user manuals.
LCI per-m² values were converted to per-module using series-specific module areas from official datasheets, with derived glass thickness values cross-validated against known S4 specs (LCI-derived 3.24–3.38 mm vs. datasheet 3.2 mm). CdTe mass was calculated from film thickness, CdTe density (5,850 kg/m³), and module area using S2/S3 at 3.0 µm and S4/S6/S7 at 2.5 µm per First Solar internal communication cited in OSTI:2308831. Copper leadwire was derived from conductor cross-section, wire length, wire count, and copper density (8,960 kg/m³) from datasheet wiring specifications.
Nitrate Dynamics Estimation in Sandy Soil Using EC Sensors
ResearchDesigned and executed a controlled laboratory study to estimate nitrate movement and concentration in sandy soil using TEROS 12 electrical conductivity sensors. Built a custom lysimeter system using modified 5-gallon buckets with dual-depth sensor placements to track nitrate transport through soil profiles under controlled flush conditions. Established a statistically significant positive correlation between nitrate concentration and EC (0.0032 mS/cm per 1 mg/L-NO₃), and demonstrated that observed EC fluctuations were driven by nitrate movement rather than changes in soil moisture, providing a low-cost, farmer-accessible method for monitoring soil nitrate leaching in agricultural settings.
IoT Household Appliance Energy Pipeline
AcademicBuilt a reproducible ETL pipeline in R ingesting IoT time-series sensor data, engineering lag and temporal features, and outputting clean datasets ready for downstream analytics. Applied rolling-origin cross-validation to prevent data leakage across time windows and benchmarked KNN, OLS, LASSO, Ridge, GBM, SVM, and Random Forest models with HAC-robust inference and VIF-based feature selection.
Titanic Survival Prediction
AcademicBuilt a binary classification pipeline in R Markdown implementing logistic regression, KNN, decision trees, Random Forest, GBM, LASSO/Ridge, XGBoost, and GAMs. Performed stratified median imputation, one-hot encoding, and principled feature exclusion based on bias-variance trade-off reasoning; evaluated models via ROC/AUC and cross-validation.
Neural Decoding of Face Identity from Macaque Brain Activity
AcademicInvestigated whether face identity could be decoded from neural spike activity recorded in the anterior medial (AM) face patch of macaque monkeys using the Freiwald-Tsao dataset. Preprocessed 2,685 trials × 400 ms of spike rasters into population-level spike count feature matrices, then trained and compared multinomial logistic regression and random forest classifiers via stratified 5-fold cross-validation. Logistic regression achieved 81% CV accuracy and 77.5% test accuracy, demonstrating that face identity is highly linearly decodable from AM population activity. Confusion matrices revealed which identities were most frequently misclassified, providing insight into the representational geometry of face space in the AM region.
Urban Delivery Route Optimization — Lansing, MI
AcademicModeled the city of Lansing, MI as a real street graph using osmnx and networkx to solve an urban last-mile delivery problem. Implemented a greedy nearest-neighbor algorithm traversing delivery nodes starting from a post office origin, incorporating real road travel times and edge speeds for realistic routing. Results demonstrated meaningful reductions in total travel time versus random traversal order, validating the greedy approach for real-time logistics applications where optimal solutions are computationally infeasible.
Contact
Let's connect.
Open to research collaborations, data science opportunities, and academic discussions.