Enterprise File & Client Classification System

Overview

In a real production environment, most failures do not come from broken code—they come from messy data and human inconsistency.

At work, I encountered a growing SharePoint repository containing thousands of folders with financial documents tied to client work. Folder names were inconsistent, client identities were ambiguous, and folders often mixed multiple entities. This created real operational risk: misfiled documents, retrieval failures, and audit exposure.

I designed FCS, an internal file and client classification system, to bring structure, validation, and accountability to an otherwise ungovernable document space.

Problem

The core challenge was not storage—it was identity.

Clients appeared under multiple naming conventions
Folder names mixed individuals, entities, and engagements
Files contained partial or conflicting client references
SharePoint provided no enforced schema or validation
Manual cleanup did not scale and introduced more errors

There was no single source of truth—and yet the system had to remain usable by non-technical staff.

Solution

I designed a classification and validation pipeline that worked with human workflows rather than against them.

Key components included: Heuristic-based client name matching and normalisation

Explicit Match, No Match, or Needs Review classification states
Validation columns to surface ambiguity instead of hiding it
Structured naming conventions enforced through review steps
Separation between automated detection and human confirmation

Rather than attempting brittle full automation, the system focused on data hygiene and error visibility, ensuring problems were caught early and intentionally resolved.

Engineering & Design Considerations

Designed for ambiguity, not ideal inputs
Avoided destructive operations or silent overwrites
Prioritised traceability and reviewability over cleverness
Balanced automation with human-in-the-loop safeguards
Operated entirely within enterprise tool constraints (SharePoint + Excel)

This was a systems design problem, not a UI or scripting task.

Impact

Identified and formalised inconsistencies in client and file naming across an existing SharePoint repository
Created a structured way to distinguish valid, ambiguous, and invalid client associations
Made document organisation issues visible rather than implicit or silently ignored
Provided a clear foundation for IT to design a more robust long-term document management solution
Reduced reliance on ad-hoc tribal knowledge for locating client files

The primary value of this work was not immediate automation, but clarity: turning an unstructured problem into a defined system that could be reasoned about, reviewed, and improved.

What I Learnt

Real systems fail at boundaries, not cores
Data hygiene is an engineering problem
Full automation is not always the correct solution
Designing for humans is part of system design
Constraints often matter more than tools

Status

This system remains internal and cannot be shared publicly due to confidentiality constraints. Its design principles continue to inform how I approach data validation, operational tooling, and system reliability in larger projects.