If you work with Oracle Fusion Cloud and have ever felt like you need more insights when it comes to data quality, process health, or IDR performance, you are not alone. The Oracle ecosystem is rich, but the gap between "we have the data" and "we actually understand what the data is telling us" can be enormous. This post walks through how we built a Python-based analysis pipeline called Catalyst AI Lab that connects directly to Oracle Fusion Cloud REST APIs and surfaces actionable findings automatically, and how AI-assisted development made the whole thing possible in a fraction of the time it would have taken otherwise.
The Problem Worth Solving
Our team supports Oracle Fusion Cloud across ERP, EPM and HCM. Like most teams, we spent a lot of time answering the same questions: Why is our Straight-Through Processing rate lower than expected? Are there duplicate suppliers in the system? Which business units have the most unresolved invoice holds, and how old are those holds? Among many other existential questions!
The answers exist in Oracle. The REST APIs are well-documented enough to get started. But building something repeatable and systematic that could answer these questions reliably, across environments, and produce a report someone could actually act on -- that is a different project entirely.
The goal was to build a tool that behaves like a knowledgeable consultant running a discovery engagement: it goes into the system, looks at the data from multiple angles, identifies what is worth paying attention to, and tells you why it matters.
The Architecture: Discovery Lenses
We organized the analysis into what I call discovery lenses. Each lens is a focused analytical perspective that targets a specific domain. The IDR Invoice Processing lens, for example, looks at STP rates, hold root causes, duplicate invoices, supplier-level IDR success, and per-BU performance breakdowns. The Supplier Data Quality lens looks at the entire supplier master for duplicates, missing site attributes, banking configuration gaps, tax registration coverage, and cross-BU payment term consistency.
Each lens follows the same pattern: connect to Oracle, fetch the relevant data via REST API, run analysis, produce structured findings with severity levels, confidence scores, and remediation guidance, then export everything to an HTML report and supporting CSV files for drill-down in Excel.
The framework is extensible. Adding a new lens means creating one Python file, decorating a class with @register_lens, and adding configuration to a YAML file. No changes to the core pipeline needed.
Agentic Development: What It Actually Looks Like
I want to be honest about what AI-assisted development meant on this project, because there is a lot of noise around this topic.
The GitHub Copilot agent in VS Code was not just writing boilerplate. I was having extended conversations with it about Oracle API behavior, asking it to review findings for accuracy, having it run the pipeline against our dev environment and then analyze the results, and using it to write and fix tests. When a finding was surfaced with a remediation recommendation that turned out to be wrong (more on that below), we flagged it, the agent researched the correct Oracle Cloud guidance, and we updated both the code and the best practices configuration together.
The workflow looked like this: I would describe what I wanted to understand about the data. The agent would research the Oracle REST API structure, write the lens code, run the pipeline, and report back with results. Then we would review the output together, identify what was accurate, what was misleading, and what needed more nuance. That iterative loop is where the real value was. It compressed weeks of work into days.
The test suite ended up at 632 tests covering pagination behavior, finding logic, report generation, CLI commands, and end-to-end pipeline execution. Having the agent write those tests alongside the features meant I could refactor confidently.
Oracle REST API: Things You Will Not Find in the Quick Start
This is the section I wish existed when we started.
First, pagination. Oracle Fusion Cloud REST APIs use offset/limit pagination with a default page size of 100 records. That is straightforward. What is not obvious is that if you want Oracle to tell you how many total records exist, you have to explicitly pass totalResults=true as a query parameter on your request. Without it, the API returns your page of data but gives you no indication of the overall population size. We were calculating STP rates against a 20,000-invoice sample before realizing we had 275,180 invoices YTD. That is a 7.3% sample, which changes how you interpret the findings significantly.
The fix was simple once we understood it: on the first page request, add totalResults=true to the query string. Oracle then returns the estimated row count in the response, and you can surface that alongside your findings as a population coverage metric. Every finding now shows something like "analyzed 20,000 of 275,180 invoices (7.3% coverage)" so the reader understands the confidence level of what they are seeing.
Second, filter syntax on certain resources. Oracle's REST API supports query filters using a SCQL-style syntax, which is powerful for many resources. But for invoiceHolds specifically, we discovered that compound filters with semicolons are silently ignored. You cannot filter by multiple conditions in a single request on that resource. The API accepts the request without error, applies no filter at all, and returns unfiltered data. If you are not paying attention, you will think your filter worked and analyze the wrong dataset. The workaround is to fetch the broader dataset and filter in application code.
Third, the OpenAPI specification files for Oracle Fusion Cloud are enormous. The five files covering ERP Financials, PPM, Procurement, Common, and HCM total nearly 400MB. You cannot parse those with standard JSON libraries without running into memory issues. The solution was ijson, a streaming JSON parser that lets you walk through the spec without loading it all into memory at once.
Fourth, credentials. Oracle Cloud API credentials need to be stored securely. The pipeline uses OS keyring integration (Windows Credential Manager on Windows, Keychain on Mac) so passwords are never in plain text files, never in environment variables that might get logged, and never in source control. This is worth doing properly from the start because once you have multiple environments and multiple team members, credential hygiene matters.
What the Analysis Actually Found
Running the full pipeline against our Prod clone environment produced some results worth highlighting, both because they illustrate what this kind of tool can surface and because they validate the approach. Below are a couple of material examples.
On the IDR side: our overall STP rate looked reasonable at the summary level, but the per-BU breakdown showed significant variance. Several business units were well below the industry benchmark. That variance was invisible in aggregate reporting. The hold analysis showed issues, which is a cash flow concern that should have been caught in normal operations.
On the supplier side: analyzing all suppliers at 100% population coverage found many potential duplicate supplier groups, duplicate addresses, and active suppliers flagged as Do Not Use. The real value is that the tool gives you a prioritized, evidence-backed list to work through rather than asking someone to manually review a spreadsheet with thousands of rows.
Lessons on Accuracy and Iteration
One thing I want to be direct about: the first version of several remediation recommendations in the tool were wrong. Not wrong in a subtle way -- wrong in the sense that they referenced Oracle lookup values that do not exist in standard Cloud instances, or recommended configuration changes that would break other functionality.
This happened because the AI was drawing on general Oracle knowledge that did not perfectly match Oracle Cloud specifically. The fix was systematic: review every finding's remediation guidance against actual Oracle Cloud documentation and lived experience, update the best practices configuration file that drives the recommendations, and add tests that assert the correct guidance is present.
That iteration process is not a failure of AI-assisted development. It is how the process is supposed to work. The agent accelerates construction; human expertise and domain knowledge ensure correctness. If you skip the review step, you will ship wrong recommendations at scale. If you do the review, you end up with something accurate and fast.
How You Can Apply This to Your Environment
The pattern generalizes beyond the specific lenses we built. If you support Oracle Fusion Cloud in any capacity, you almost certainly have questions you are answering manually today that could be answered systematically with this approach.
Start with one question you care about. What is your PO-backed invoice percentage? How many suppliers have incomplete site setup? What is the distribution of expense report approval times across business units? Pick something with a clear Oracle REST API resource behind it, build a focused analysis around it, validate the results against what you know to be true, and iterate from there.
The Oracle REST API surface area is large enough that you can build meaningful analytics across almost any domain without needing a data warehouse, a separate reporting tool, or an extract from production. The data is there. The APIs are there. The AI tooling to help you build something coherent on top of them is better than it has ever been.
The repository for this project is internal to our organization, but the patterns, the Oracle API gotchas, and the architectural approach are all things you can apply independently. If you are working on something similar or have questions about the Oracle-specific pieces, drop a comment below. I am happy to go deeper on any of it.
No comments:
Post a Comment