AI-Powered Payslip Extraction with Full Australian Data Residency

An Australian income-verification fintech provides digital income verification to lenders, real estate agents and financial institutions. The platform already used Amazon Bedrock with Claude to read payslips, but its single-prompt approach had reached its limits against the enormous variety of Australian payslip formats. With strict data-sovereignty requirements and an ISO 27001 certification to maintain, the client engaged Infostatus to design a more robust, scalable architecture that kept all processing within Australia.

The challenge

A single large prompt required constant iteration as new payslip formats appeared
Australian payslips span many data categories, each with different compulsory and optional fields
Relying on one model created a single point of failure on difficult or poor-quality scans
A lack of confidence scoring meant significant manual review remained
All processing, including development and testing, had to stay within Australia

What we did

We replaced the monolithic prompt with a multi-tier pipeline that uses document context to drive intelligent routing, applying progressively more sophisticated techniques only when needed.

Context engineering and document analysis. Raw text is extracted with PyMuPDF for digital PDFs or Amazon Textract OCR for scans, then classified so each category is processed on its own path.
Targeted extraction. Amazon Bedrock generates category-specific Amazon Textract queries, which return values with built-in confidence scores.
Confidence-based enhancement. Only low-confidence fields are reprocessed through Bedrock, which dramatically reduces token usage.
Intelligent fallback. Documents that still fail are flagged with specific, actionable guidance for human review.

To stay within regional model quotas, we buffered requests with Amazon SQS, applied exponential backoff and circuit breakers, and used Amazon EventBridge for scheduled polling. All infrastructure is deployed exclusively in the Sydney region.

Context-engineered extraction pipeline

Outcomes

Extraction accuracy lifted from 82% to 94%
An 80% reduction in AI token costs through intelligent routing
The end of the constant prompt-maintenance cycle
A roughly 85% reduction in manual review time through context-aware flagging
100% Australian data residency

Lessons learned

Understanding a document before processing it improves both accuracy and efficiency, multiple specialised services with smart routing outperform a single large prompt, and built-in confidence scoring enables autonomous decisions about how each document should be handled.