Data Scientist - Senior Associate at KPMG US
| |*If curious about the hidden metrics & sensitive information on
this page, please contact me directly via email or LinkedIn
Senior Data Scientist with 5+ years of industry experience working on fast-moving and high-impact projects related to product pricing, fraud detection, customer demand, supply chain, & healthcare analysis. Proven record of translating complex models into executive-level strategies, influencing large scale revenue and client pursuits at KPMG. Proficient in Python, SQL, R, and modern dashboard tools like PowerBI and Streamlit
Timeline: Sep 2022 - Present
Client(s): KPMG Go-To-Market Teams
Implemented a hierarchical multi-agent AI orchestration system using LangChain, Python, SQL, and Streamlit, enabling dynamic interaction between users and KPMG-owned industry & company financial data and documents
Built a supervisor-agent architecture, where a central interface “supervsior” agent routes queries to domain-specific supervisor agents, each managing their own specialized task agents (e.g., data retrieval, document summarization, & company screening etc.)
Deployed the AI framework via a Streamlit frontend enabling KPMG sales engineers & analysts to explore proprietary datasets and accelerate client and project proposal development with context-aware AI agent workflows
Applied Latent Dirichlet Allocation (LDA) in Python to perform topic modeling across large sets of transcript headlines, uncovering key topic clusters that informed the design of filter options and improved the user experience within the AI-enabled analytics dashboard
Client(s): KPMG Healthcare & Life Sciences Go-To-Market Teams
Serve as co-lead data scientist and database manager for KPMG’s healthcare and life sciences teams, owning end-to-end data workflows using SQL and Python to support client strategy and analytics.
Implemented multiple scalable, modular SQL & Python scripts to manipulate electronic health claims data and address specific research questions on Snowflake; Results have been adopted across 30+ client projects, delivering actionable insights to healthcare professionals and influencing multi-million-dollar revenue outcomes
Client(s): Small Budget-Airline & KPMG Audit Team
Led the development of a dynamic time series regression and quasi-experimental modeling pipeline to estimate the causal impact of route strategies on ticket sales for a budget airline, integrating client data with external U.S. flight data from the Bureau of Transportation Statistics (BTS)
Delivered findings via an interactive PowerBI dashboard, enabling business & audit analysts to run A/B-style scenario simulations and explore counterfactual outcomes for ticket sale optimization
Client(s): KPMG Supply Chain & Procurement Go-To-Market Teams
Developed a novel supply chain “stress” metric in collaboration with domain experts, using latent variable analysis, time series forecasting, and feature engineering in R and Python to quantify systemic disruptions across U.S. supply chains
Managed a cross-functional team within an Agile (JIRA-based) framework to deliver a scalable, auto-refreshing analytics pipeline
The metric was published at the ASCM Conference and subsequently reused across multiple KPMG client pursuits, demonstrating its strategic value for answering critical operational and financial questions
Timeline: Sep 2020 - Aug 2022
Client: Mid-sized Regional Insurance Company
Improved KPMG’s homeowners insurance underwriting pipeline by building and validating generalized additive models (GAMs) and GLM-based regressions (Poisson, Gamma, Tweedie) to estimate loss-cost and pure premiums across various perils, integrating external environmental features to improve predictive accuracy
Designed these models to mirror causal frameworks by simulating counterfactual scenarios (e.g. renovations to a home, age of roofing, and distance to nearest fire station etc.)
Automated data extraction and feature engineering workflows using R and internal APIs, enabling efficient, repeatable updates to the pricing models and delivering insights via a stakeholder-facing PowerBI dashboard
Client(s): Global Pharmaceutical Company & KPMG Procurement & Supply Chain team
Developed a machine learning framework leveraging Extended Isolation Forests to assign anomaly scores, allowing teams to prioritize investigations based on adjustable risk thresholds. Integrated model outputs into a PowerBI dashboard, providing real-time visibility into transaction risk and supporting data-driven client decision-making.
Designed and implemented a Python-based data anonymization pipeline, using random name generators and join-preserving keys to securely handle B2B transaction data from five disparate sources.
Engineered and merged these anonymized datasets into a unified modeling set with pandas, enabling exploratory analysis and anomaly detection across pharmaceutical vendor–retailer interactions.
Collaborated with KPMG stakeholders and supply chain experts to define a high-impact feature space for detecting fraud and malpractice.
Client: KPMG Financial Services Go-To-Market Teams
Project: ESG Score Calculator
Project: Reusable & Refreshing Data Curations
Project: Spatial Interpolation for Climate Risk
Project: Latent Variable Analysis on U.S. Neighborhoods
Timeline: Jun 2019 - Jul 2019
Implemented and applied an Isolation Forest-based anomaly detection layer within Nielsen’s media data integration pipeline using Python to identify irregularities – such as bots – and improve the accuracy of audience data samples; this work directly contributed to earning a return offer
Maintained robust version control using Git & Bitbucket, enabling reproducible, modular commits that teammates could pull and merge into the main data science pipeline for seamless integration and collaboration
Presented and demoed end-to-end data science workflow to both of Nielsen’s Data Science pillars using a custom-built RShiny app, showcasing analytical insights and technical implementation at the conclusion of the internship