Data Analysis and Visualisation
Data Visualisation of Immigration into Selected EU Countries
This project makes use of pandas dataframes and the library of Matplotlib. It focuses on plot types such as line plots, bar plots, area plots, and histograms to extract meaningful insights from the dataset under investigation.
Geospatial Analysis of the 2015 UK Election Result
This project involves analyzing a diverse geospatial dataset with vectors, rasters, .shp, and .tif files, focusing on customizing geographical data and overlaying features for detailed analysis, while also exporting plots to improve presentation of findings.
Netflix Data Analysis
A friend and I were discussing Netflix movies, and he mentioned that modern movies seem shorter than classics. Intrigued, I decided to analyze Netflix data through Exploratory Data Analysis (EDA) to investigate movie lengths. I aimed to determine if there's truth to the notion of "movies getting shorter" and identify potential reasons behind this trend.
Model Building (Machine Learning and Statistical Models)
YouTube Analysis and Recommendation System
I analyzed a dataset containing information about the top 1000 YouTube streamers, focusing on variables such as number of likes, comments, country of origin, and content category. Utilizing this data, I developed a recommendation system to suggest relevant content to users.
Diabetes Classification
I analyzed a diabetes dataset containing only women's records to predict the likelihood of diabetes occurrence. The dataset encompasses variables like the number of pregnancies, blood pressure, glucose level, and insulin. I utilized three classification models and conducted hyperparameter tuning to optimize model performance.
Book Recommendation System
I developed a book recommendation system using a dataset obtained from UCI's repository. The dataset includes information about the title of the book, the author, and the genre it falls under. By leveraging this data, I aimed to provide personalized book recommendations to users based on their preferences.
SMS Classification
I developed an SMS classifier utilizing five classification models: Gaussian Naive Bayes, Multinomial Naive Bayes, Decision Tree Classifier, Random Forest Classifier, and Support Vector Classifier. The models were evaluated using a classification report and confusion matrix, and their performance was ranked.
Sentiment Analysis on British Airways Reviews
During my virtual internship at British Airways, I conducted sentiment analysis on a given dataset. Using the RoBERTa and VADER models, I identified negative, positive, and neutral reviews, as well as determined the most frequently discussed topics among the reviews.
Machine Learning Algorithm for Iris Dataset
I utilized the popular Iris dataset to build both a classifier and a clustering model. The project involved performing hyperparameter tuning and presenting the results in a visual format.
Credit Card Application Classification
I developed a credit card application approval system using UCI's dataset. Utilizing logistic regression, I conducted hyperparameter tuning and grid search to optimize the model for binary classification.
GridSearchCV with Random Forest and Gradient Boosting
This project aims to solve a classification problem using machine learning techniques, specifically GridSearchCV with Gradient Boosting and Random Forest Classifier.
Wage Prediction Web Application
This web application predicts a programmer's expected income based on their years of experience, highest academic qualification, and country of residence. By leveraging machine learning models trained on relevant data, the app provides accurate and insightful income estimates tailored to individual profiles. This tool is designed to help programmers make informed career decisions and understand the financial landscape of their profession.
Time Series Forecasting with XGBoost
In this project, I employed XGBoost to forecast energy consumption using a subset of data. By analyzing feature importances, I refined the model, prioritizing the most significant features. This iterative approach enhanced prediction accuracy, offering insights crucial for optimizing energy management strategies in real-world applications.
Computer Vision
Car-Pedestrian Detection
This project makes use of OpenCV for car and pedestrian detection. It focuses on real-time object detection by utilizing pre-trained models to identify and track vehicles and pedestrians in video feeds. The system processes each frame to draw bounding boxes around detected objects.
Python Scripting
API Integration and Database Storage
This project is designed to demonstrate the integration of external API calls, local database storage, and efficient data retrieval. The primary components include a class responsible for making API calls to fetch data from the internet and an SQL class for storing and retrieving this data in a local database.
US Bikeshare Scripting
This project involves exploring and analyzing a CSV dataset on bike sharing from three different cities in the United States using Python. The analysis is conducted based on user inputs, creating an interactive experience for the audience.
Register App Users with Functions
In this project, I utilized helper functions to facilitate user registration and implement robust error handling. Throughout the project, I focused on defining functions with descriptive docstrings, leveraging self-created Python modules for modular code organization, and implementing effective error handling strategies to ensure smooth execution and graceful handling of unexpected scenarios.
Statistical Analysis
Unveiling the Weather Web: Are Humidity, Temperature, and Windspeed Entangled?
Does the weather play a game of connect-the-dots? This analysis delves into a weather dataset to explore potential relationships between humidity, temperature, and windspeed, to determine if these weather elements exhibit statistically significant correlations.
A Statistical Look at Goals in Football
This analysis tackles the question: Do women's international soccer matches see more goals than men's? Using a 10% significance level, I dug into the data to see if there's evidence to support this question.
Side Effects: Analyzing Adverse Drug Reactions in a Pharmaceutical Dataset
Side effects are a common concern with medications. Using data from Hbiostat (link to data in project), I employed statistical methods to evaluate the reported adverse reactions and determine if they are statistically significant or simply infrequent occurrences.
Structure Query Language (SQL)
Exploring a DVD dataset
In this project, I utilised JOINs and AGGREGATIONs, combined with subqueries/CTEs and windows functions to query the data in the DVD rental database.
Exploring a Country Club Dataset
This dataset presented a valuable opportunity for me to explore the capabilities of PostgreSQL beyond basic queries. I delved into more complex functionalities like window functions, CTEs, Views, etc. This exploration helped me gain a deeper understanding of various concepts.
String Operations in PostgreSQL
This SQL project focuses on various string manipulation tasks within a database. It demonstrates my proficiency in string functions and pattern matching in PostgreSQL, providing insights into data processing and cleansing techniques within a relational database environment.

