Chronological
All Projects
Every AI project, model, competition, and publication — from 2020 to present.
★ featured on home page
2020
Sentiment Classifier — Matters HK
Volunteered to build a self-improving sentiment classification web app for Matters, a Hong Kong publisher. The model updated with each user interaction.
2022
LLM Classification with BERT — Microsoft
First LLM classification model using Hugging Face BERT, developed at Microsoft. Initial hands-on foray into state-of-the-art NLP.
2023
NLP Tutorial Series
Composed a suite of introductory NLP tutorials covering tokenisation, stemming, lemmatisation, transformers, and attention mechanisms. Published in Towards AI and Towards Data Science.
Sentiment Analysis — L.A. Restaurants
Traditional NLP sentiment model trained on L.A. restaurant reviews.
New York Restaurants Sentiment Analysis
NLP sentiment analysis model applied to New York restaurant reviews.
London McDonald's Social Listening
Social listening project using NLP to analyse sentiment around McDonald's in London.
Action Game Recommendation Engine
Topic modelling and collaborative filtering system for action game recommendations.
CommonLit Student Summary Quality Assessment — Top 32%
Kaggle competition. LLM-based text quality evaluation. Top 32% out of 4,700+ teams.
LLM Science Exam Challenge — Top 42%
Kaggle competition. Multiple-choice science exam solved by LLM. Top 42% out of 4,700+ teams.
4-bit Quantisation & Llama2 — Tutorial
In-depth Kaggle tutorial on 4-bit quantisation and its impact on Llama2 performance.
Small Language Model (SLM) — Built from Scratch
Designed and trained a small language model end-to-end, sharing the process and insights with the community.
KidBot — LangChain + OpenAI
Chatbot with the personality and vocabulary of a 5-year-old, built with LangChain and OpenAI.
PitchPal — Sales Copy Assistant
LLM-powered tool for generating sales copy, tweets, and product descriptions for different target audiences.
ChatMate — Memory-enabled Chatbot
Conversational chatbot capable of recalling previous parts of the conversation, such as a user's name.
2024
Positional Embeddings — Clearly Explained
Article dissecting the mathematics of positional embeddings in NLP models and their integration with original embeddings, using drawings to make it accessible.
AI Email Pilot — Wandsworth Borough Council
Self-initiated pilot exploring a local LLM for council email handling, presented as an AI project proposal to Wandsworth Borough.
AI Research Assistant — Llama3 + LlamaIndex + Qdrant + FastAPI
Real-world use-case article and code: building an AI-powered research assistant with a modern RAG stack.
Azure AI Fundamentals Badge
Earned the Fundamentals of Azure AI Services badge in the AI Skills Challenge.
Operationalizing LLMs on Azure — Duke University
Completed the "Operationalizing LLMs on Azure" certificate course by Duke University.
Azure AI Skills Challenge Badge
Awarded the AI Skills Challenge badge from Microsoft Azure.
2025
AI Mathematical Olympiad — Progress Prize, Rank 72nd
Kaggle competition. Mathematical reasoning with LLMs at olympiad level. Highest personal ranking.
Fine-Tuning Open-Source LLMs for Text-to-SQL — 3-part series
End-to-end series covering motivations, machine setup on WSL2, and results from fine-tuning open-source LLMs for Text-to-SQL tasks.
AI Registration App — Council Pilot
AI-powered registration system for local government. FastAPI backend deployed on Render, Gradio frontend on HuggingFace.
PAI Debater — GPT-OSS 20B Fine-tune ★
Fine-tuned GPT-OSS 20B into a distinct debate personality via QLoRA on RTX 4090. Full pipeline: dataset curation, fine-tuning, Transformers deployment.
Teaching GPT-OSS 20B Multilingual Reasoning
Hands-on guide fine-tuning OpenAI's GPT-OSS 20B model for multilingual reasoning on an RTX 4090.
MCP 101 — Modular AI Agent for Stock Investment Insights
Tutorial on building a composable AI agent using the Model Context Protocol (MCP) for stock investment analysis.
GPT-OSS 20B — PubMed Fine-tune ★
Fine-tuned GPT-OSS 20B on PubMedQA with chain-of-thought reasoning, achieving 73.6% accuracy — rank 19th on the PubMedQA leaderboard. Trained via QLoRA on RTX 4090.
Gemma-3 Pruning & Knowledge Distillation ★
Optimised Google's Gemma-3-270M via structured pruning and knowledge distillation — exploring model compression techniques for efficient LLM deployment.
AI Personalities
AI chatbots with distinct characters and faces — a product exploring personality-driven LLM interactions.
OpenClaw App
Built a personal automation app using OpenClaw on macOS — AI-powered inbox management.
Ongoing
London Property Market Analyst ★
Multi-agent RAG chatbot over 1.76M HM Land Registry transactions. 16 years of data, all 33 London boroughs, chart output on every answer. Flagship project.