Data Analysis and Visualisation
Groceries Sales Analysis
This project analyzes grocery sales data to uncover key trends, patterns, and insights using R. It explores hourly and weekly sales trends, identifies top and bottom-performing products, and visualizes deviations from averages. The results are presented in a Quarto document and Jupyter Notebook for clear, actionable insights.
Model Building (Machine Learning and Statistical Models)
YouTube Analysis and Recommendation System
I analyzed a dataset containing information about the top 1000 YouTube streamers, focusing on variables such as number of likes, comments, country of origin, and content category. Utilizing this data, I developed a recommendation system to suggest relevant content to users.
Diabetes Classification
I analyzed a diabetes dataset containing only women's records to predict the likelihood of diabetes occurrence. The dataset encompasses variables like the number of pregnancies, blood pressure, glucose level, and insulin. I utilized three classification models and conducted hyperparameter tuning to optimize model performance.
Book Recommendation System
I developed a book recommendation system using a dataset obtained from UCI's repository. The dataset includes information about the title of the book, the author, and the genre it falls under. By leveraging this data, I aimed to provide personalized book recommendations to users based on their preferences.
SMS Classification
I developed an SMS classifier utilizing five classification models: Gaussian Naive Bayes, Multinomial Naive Bayes, Decision Tree Classifier, Random Forest Classifier, and Support Vector Classifier. The models were evaluated using a classification report and confusion matrix, and their performance was ranked.
Sentiment Analysis on British Airways Reviews
During my virtual internship at British Airways, I conducted sentiment analysis on a given dataset. Using the RoBERTa and VADER models, I identified negative, positive, and neutral reviews, as well as determined the most frequently discussed topics among the reviews.
Machine Learning Algorithm for Iris Dataset
I utilized the popular Iris dataset to build both a classifier and a clustering model. The project involved performing hyperparameter tuning and presenting the results in a visual format.
Credit Card Application Classification
I developed a credit card application approval system using UCI's dataset. Utilizing logistic regression, I conducted hyperparameter tuning and grid search to optimize the model for binary classification.
GridSearchCV with Random Forest and Gradient Boosting
This project aims to solve a classification problem using machine learning techniques, specifically GridSearchCV with Gradient Boosting and Random Forest Classifier.
Wage Prediction Web Application
This web application predicts a programmer's expected income based on their years of experience, highest academic qualification, and country of residence. By leveraging machine learning models trained on relevant data, the app provides accurate and insightful income estimates tailored to individual profiles. This tool is designed to help programmers make informed career decisions and understand the financial landscape of their profession.
Time Series Forecasting with XGBoost
In this project, I employed XGBoost to forecast energy consumption using a subset of data. By analyzing feature importances, I refined the model, prioritizing the most significant features. This iterative approach enhanced prediction accuracy, offering insights crucial for optimizing energy management strategies in real-world applications.