Data Analytics & Automation
Data work is only useful when it produces something actionable. I build the full
pipeline — collection, cleaning, transformation, analysis, and output — as
production-grade software, not notebook experiments. Whether you need to monitor
a market, extract structured data from unstructured sources, automate a
repetitive workflow, or surface insights from a dataset, I build the system
that does it reliably and repeatably.
What's included
Web scraping and data collection
Production scrapers built with BeautifulSoup4 and Python that extract structured
data from websites at scale. Rate limiting, pagination handling, retry logic,
and output to CSV, JSON, or a database. I have built and deployed a LinkedIn
job market scraper — KeyFinder — as a live web application with a Flask REST API
on top of the pipeline. Scraping done properly: respectfully, reliably, and with
the output format you actually need.
NLP and text analysis pipelines
Keyword extraction, n-gram frequency analysis, tokenisation, entity recognition,
and sentiment classification. Multi-stage pipelines that ingest raw text, clean
it, process it, and surface structured insights. Built in Python with NLTK, and
Pandas — production pipelines, not tutorial code. I have shipped NLP work as
a REST API consumed by a web frontend, not just as a script.
Data cleaning and transformation
Raw data is rarely usable. I clean it — deduplicate, normalise, handle missing
values, parse inconsistent formats, and output a dataset that is actually fit
for analysis. Pandas and NumPy for numerical and tabular work. SQLAlchemy for
database-backed pipelines. Transformation logic documented and reproducible.
Automation scripts and workflow tools
Repetitive manual work turned into reliable automated pipelines. File processing,
report generation, data export, API polling, scheduled jobs. Python scripts and
CLI tools built to run unattended and fail gracefully when something upstream
changes.
REST API delivery of data pipelines
Data pipelines exposed as REST APIs — Flask or FastAPI — so the output of an
analysis or collection job is consumable by a frontend, a dashboard, or a
downstream service. I have built this pattern in production: scraping pipeline
as a backend, React or HTML frontend consuming the results through an API.
Dashboard and reporting
Structured data output to CSV, Excel, or database tables for reporting workflows.
If you need a web-based dashboard on top of the data, I build that too — from
the pipeline to the chart on screen.
Good fit if you need
Market research automated instead of done manually every week. A keyword
analysis tool for job postings, competitor content, or customer feedback.
A scraper for a data source that does not have an API. A Python pipeline
that cleans and transforms data before it goes into your reporting tool.
An NLP system that categorises or extracts information from text at scale.