Ongoing
London Property Market Analyst ★
Flagship project and CV centrepiece. A fully automated, end-to-end data ecosystem — not just a chatbot. The system monitors the HM Land Registry data source, runs automated data engineering pipelines, manages and updates its own database, performs statistical analysis across 1.76M transactions and 16 years of London property data, and delivers answers directly to users across all 33 boroughs. Every layer — from raw data ingestion to the user-facing response — is self-sustaining and runs without human intervention.
UK Property Data Pipeline
Automated monthly data pipeline publishing updated Kaggle datasets and mini-reports from HM Land Registry Price Paid data. Runs every 5th of the month.
2025
OpenClaw App
Built a personal automation app using OpenClaw on macOS — AI-powered inbox management.
Gemma-3 Pruning & Knowledge Distillation ★
Optimised Google's Gemma-3-270M via structured pruning and knowledge distillation — exploring model compression techniques for efficient LLM deployment.
GPT-OSS 20B — PubMed Fine-tune ★
Fine-tuned GPT-OSS 20B on PubMedQA with chain-of-thought reasoning, achieving 73.6% accuracy — rank 19th on the PubMedQA leaderboard. Trained via QLoRA on RTX 4090.
MCP 101 — Modular AI Agent for Stock Investment Insights
Tutorial on building a composable AI agent using the Model Context Protocol (MCP) for stock investment analysis.
Teaching GPT-OSS 20B Multilingual Reasoning
Hands-on guide fine-tuning OpenAI's GPT-OSS 20B model for multilingual reasoning on an RTX 4090.
PAI Debater — GPT-OSS 20B Fine-tune ★
Fine-tuned GPT-OSS 20B into a distinct debate personality via QLoRA on RTX 4090. Full pipeline: dataset curation, fine-tuning, Transformers deployment.
AI Registration App — Council Pilot
AI-powered registration system for local government. FastAPI backend deployed on Render, Gradio frontend on HuggingFace.
Fine-Tuning Open-Source LLMs for Text-to-SQL — 3-part series
End-to-end series covering motivations, machine setup on WSL2, and results from fine-tuning open-source LLMs for Text-to-SQL tasks.
AI Mathematical Olympiad — Progress Prize, Rank 72nd
Kaggle competition. Mathematical reasoning with LLMs at olympiad level. Highest personal ranking.
2024
Azure AI Skills Challenge Badge
Awarded the AI Skills Challenge badge from Microsoft Azure.
Operationalizing LLMs on Azure — Duke University
Completed the "Operationalizing LLMs on Azure" certificate course by Duke University.
Azure AI Fundamentals Badge
Earned the Fundamentals of Azure AI Services badge in the AI Skills Challenge.
AI Research Assistant — Llama3 + LlamaIndex + Qdrant + FastAPI
Real-world use-case article and code: building an AI-powered research assistant with a modern RAG stack.
AI Email Pilot — Wandsworth Borough Council
Self-initiated pilot exploring a local LLM for council email handling, presented as an AI project proposal to Wandsworth Borough.
Positional Embeddings — Clearly Explained ★
Article dissecting the mathematics of positional embeddings in NLP models. Uses a Crayola colour-blending analogy to show how embeddings represent words as numbers. Independently worked out how the positional encoding matrix is structured: each row is a word position (k), each column is a dimension (i), computed via alternating sine and cosine — then added element-wise to the semantic embedding. Published on Medium.
2023
ChatMate — Memory-enabled Chatbot
Conversational chatbot capable of recalling previous parts of the conversation, such as a user's name.
PitchPal — Sales Copy Assistant ★
LLM-powered tool for generating sales copy, tweets, and product descriptions for different target audiences. Built as a solo project, PitchPal held its own in a direct face-off against Attentive®, a funded marketing AI — a moment that proved the work was on the right track.
KidBot — LangChain + OpenAI
Chatbot with the personality and vocabulary of a 5-year-old, built with LangChain and OpenAI.
Small Language Model (SLM) — Built from Scratch
Designed and trained a small language model end-to-end, sharing the process and insights with the community.
4-bit Quantisation & Llama2 — Tutorial
In-depth Kaggle tutorial on 4-bit quantisation and its impact on Llama2 performance.
LLM Science Exam Challenge — Top 42%
Kaggle competition. Multiple-choice science exam solved by LLM. Top 42% out of 4,700+ teams.
CommonLit Student Summary Quality Assessment — Top 32%
Kaggle competition. LLM-based text quality evaluation. Top 32% out of 4,700+ teams.
Action Game Recommendation Engine
Topic modelling and collaborative filtering system for action game recommendations.
London McDonald's Social Listening
Social listening project using NLP to analyse sentiment around McDonald's in London.
New York Restaurants Sentiment Analysis
NLP sentiment analysis model applied to New York restaurant reviews.
Sentiment Analysis — L.A. Restaurants
Traditional NLP sentiment model trained on L.A. restaurant reviews.
NLP Tutorial Series
Composed a suite of introductory NLP tutorials covering tokenisation, stemming, lemmatisation, transformers, and attention mechanisms. Published in Towards AI and Towards Data Science.
2022
LLM Classification with BERT — Microsoft ★
Built and delivered a BERT-based LLM classification model for the team during my tenure at Microsoft — the first hands-on dive into state-of-the-art NLP. A rewarding year.
2021
CS50's Introduction to Artificial Intelligence with Python — Harvard University ★
Completed Harvard's foundational AI course covering search algorithms, classification, optimisation, machine learning, neural networks, natural language processing, and large language models — all implemented in Python.
2020
Sentiment Classifier — Matters HK
Volunteered to build a self-improving sentiment classification web app for Matters, a Hong Kong publisher. The model updated with each user interaction.