Data Scientist
About Me
I gratuated from McGill University in December 2022 with a degree in Mathematics and Business Analytics. I am now continuing to improve my skills at Ecole Polytechnique and HEC Paris with a master’s degree in data science. I aspire to use my knowledge and expertise to provide data-driven solutions for better business decisions. In addition to my passion of data science, I enjoy playing tennis 🎾 and composing music 🎵. I am always willing to learn new traits.
Education
- Master of Science in Data Science & AI for Business, École Polytechnique / HEC Paris (2023-2025)
- GPA: 3.87/4.0
- Ranked 3rd globally in QS Master in Business Analytics rankings (2024)
- Bachelor of Commerce, McGill University (2019-2022)
- Major in Mathematics, Concentration in Business Analytics
- McGill One-Year Entrance Scholarship (2019)
Professional Experience

- Queried and joined 20+GB datasets in BigQuery, matching product names with prices (one-to-many relationships) and converting 50M+ prices to USD to standardize data.
- Optimized product distribution across categories by implementing data-driven sampling techniques, identified and removed pricing anomalies, and efficiently managed processed datasets in AWS S3.
- Implemented a state-of-the-art Multimodal Transformer model to integrate textual, numerical, and categorical features for enhanced product classification accuracy.
- Developed and deployed a Databricks pipeline integrated with MLflow for model tracking, running six parallel combining strategies, and boosting classification metrics by 2% over the existing model without price features.


BCG-X Data Challenge - Customer Churn Prediction & AI Strategy Development (February 2025)
- Partnered with 3 students from master’s in strategic management to identify and prioritize 10 AI-driven data use cases, utilizing an Impact-Feasibility Matrix to select churn prediction as the most valuable and implementable.
- Developed a customer churn prediction model using Logistic Regression, achieving 92.2% accuracy with 96% precision for identifying at-risk customers.
- Engineered 5 key customer insights by analyzing purchase frequency, total sales, engagement channels, and customer lifetime, leading to targeted retention strategies.
- Presented findings to the BCG-X consultants, demonstrating a potential 50% increase in customer retention and a projected €150M revenue uplift through improved sales and marketing efficiency.


L’Oreal Data Challenge – Marketing Mix Modeling (February 2025)
- Conducted marketing mix modeling using Google Meridian, analyzing $10M+ in A&P investments and isolating their impact on offline & online sales between 2022 and 2023, achieving an R² of 0.81.
- Engineered ad stock effects, diminishing returns, and lagged variables to improve predictive accuracy, selecting the top 20 significant drivers from 60+ variables via forward selection.
- Analyzed ROI across 20 marketing channels by comparing incremental revenue vs annual investment, identifying trends and assessing short-term and long-term effects.
- Collaborated with a team of 6 to analyze saturation points and optimize response curves, providing data-driven budget allocation recommendations to L’Oréal data scientists & marketing officers, supporting strategic marketing evaluations.


Eleven Strategy Data Challenge - AI-Driven Client Selection for Luxury Events (January 2025)
- Designed a commercial proposal of an AI-driven client selection model within one week, leveraging 3 datasets to optimize luxury event marketing strategies.
- Leveraged a 50K+ transaction dataset to measure the causal impact of event invitations on revenue by comparing pre-event and post-event client spendings.
- Achieved 92% accuracy in predicting event attendance probability with a Logistic Regression model, enhancing invitation precision.
- Developed an interactive Streamlit app that enables marketing teams to generate a list of high-potential clients to invite, improving event attendance efficiency.


- Integrated 6 diverse datasets (sales, costs, traffic) at store and mall levels, creating a unified dataset with over 15,000 rows to support comprehensive analysis.
- Developed 6 KPIs focused on revenue, cost efficiency, and traffic flow, enabling precise evaluation of store and mall performance to uncover inefficiencies.
- Leveraged hierarchical clustering to categorize malls into 4 distinct segments (luxury, fashion, convenient, family-friendly) and tailored strategic recommendations accordingly.
- Collaborated with a team of 6 to present insights to URW managers, earning recognition as one of the top 2 teams out of 10 for delivering highly actionable recommendations.


Schneider Electric - Plastic Cost Prediction (December 2024)
- Designed data preprocessing pipelines, merging over 10 datasets and ensuring consistency through timestamp alignment, format unification, and missing value imputation, obtaining quarterly data from 2017 to 2024.
- Implemented and evaluated forecasting models to forecast Polycarbonate and Green Polycarbonate prices for the first 3 quarters of 2025, with Deep Learning model achieving the best performance with a Mean Absolute Error of 0.115.
- Delivered strategic business insights by correlating cost forecasts with Schneider Electric’s pricing and competitor data, enabling informed procurement and pricing decisions.
- Collaborated with a cross-functional team of five members to create a state-of-the-art solution, recognized as the best-performing team among 6 competing groups, for developing data-driven solutions that aligned with Schneider Electric’s sustainability and profitability objectives.


Capgemini Invent - Air Quality in Paris Time Series Forecasting (October 2024)
- Performed data preprocessing for time series forecasting, including the integration of 3 external sources (weather, holidays, covid-19 periods) to improve model relevance, as well as the handling of missing and outlier data via Qolmat’s analysis tools, guaranteeing data integrity and continuity.
- Trained and evaluated multiple forecasting models using Darts to predict hourly concentrations of five key pollutants over a three-week period using historical data from 2020 to 2024.
- Implemented a hybrid ensemble model combining LightGBM and CatBoost, achieving our best Mean Absolute Error of 5.74, reaching top 3 ranking on Kaggle’s leaderboard.

Capgemini Invent - Customer Feedback Analysis for TotalEnergies (January-March 2024)
- Developed an automated solution to scrape over 200 pages of customer reviews from Trustpilot using Selenium to analyze the Voice of Customer.
- Leveraged BERTopic to analyze over 10,000 customer interactions, uncovering key topics and pain points across entire customer journey for TotalEnergies and its main competitors.
- Collaborated with a team of six students to present findings to a panel of data science consultants, communicating data-driven insights on customer sentiment.

Skills
- Languages & Databases: Python, SQL, Git, GCP (BigQuery), AWS (S3)
- Data Visualization: Matplotlib, Seaborn, Plotly, Tableau
- Machine Learning: Numpy, Pandas, Scipy, Statsmodels, Scikit-learn, PySpark, TensorFlow, PyTorch, Keras, spaCy
- AI Models: Supervised Learning, Unsupervised Learning, Natural Language Processing, Deep Learning, Time Series, Causal Inference, Marketing Mix Modeling, Reinforcement Learning
- MLOPS: Mlflow, Docker, FastAPI, Prophet
Interests
- Tennis (15+ years)
- Clarinet (15+ years)
- Music production (SoundCloud)
- Traveling