If you've spent any time supporting Oracle Cloud ERP, HCM, or EPM, you already know the pattern. A user submits a ticket about a payroll element not processing. You ask a series of diagnostic questions. You identify whether it's an effective date problem, a payroll cutoff issue, a security profile gap, or one of a dozen other root causes. You document the fix. Two months later, a different user hits the same issue and the whole cycle starts over.
This post walks through a tool I built to break that cycle. It is called Solution Designer, and it uses an agentic AI approach on top of Databricks to give your Oracle support team structured, consistent, context-aware troubleshooting guidance -- without requiring users to know exactly what to ask.
The Problem With Generic AI for Oracle Support
The first instinct most teams have is to point users at a general-purpose chatbot. "Just ask it your Oracle question." The problem is that Oracle Cloud is enormous and deeply configuration-dependent. A question like "why isn't my payroll processing?" has radically different answers depending on element type, entry method, effective date, pay period status, and whether payroll has already been run for the period.
A generic chat interface puts all of that burden on the user. They have to know which details matter, phrase the question precisely, and interpret an answer that wasn't tuned to their specific setup.
I took a different approach. Instead of a free-form chat interface, I built a guided intake flow that collects the right diagnostic signals before the AI ever generates a response. The model then has full context to produce a solution that actually fits the situation.
What Solution Designer Does
At its core, the tool is a Flask application deployed as a Databricks App. It connects to OCI or Claude via the Databricks Foundation Model API or the OCI LLM API. The experience works like this:
1. The user picks a scenario from a categorized library. The library currently covers 31 Oracle scenarios across ERP (Payables, Procurement, Receivables, General Ledger, Fixed Assets, Inventory, Projects), HCM (Payroll, Absence, HCM Data Loader, Fast Formula, Security, Workforce Compensation), EPM (Consolidation, Planning Data Loads, Rule Troubleshooting), and integrations (REST API, FBDI, BI Publisher, OTBI, Workflow Notifications).
2. The tool presents a structured intake form specific to that scenario. For Payroll Element Entries, for example, the user answers eight targeted questions: what type of element, how it was created, the specific symptom, how many employees are affected, the effective date, the pay period, whether payroll has already run, and the current status of the element entry. Every question has help text explaining why it matters.
3. The scenario definition includes edge cases. Each scenario JSON file encodes known Oracle gotchas as conditional logic. If the effective date falls outside the pay period, the system flags that as a high-severity edge case and incorporates it into the prompt context automatically. If payroll already ran and the entry is still unprocessed, the tool knows to surface the rollback-or-next-period decision. Users do not have to discover these traps manually.
4. OCI or Claude generates a tailored solution. With the structured inputs and edge case context loaded, OCI/Claude produces a step-by-step resolution path that is specific to what the user actually described. This is not a generic knowledge article. It is a direct answer to their exact configuration.
5. The user can refine through follow-up questions. After the initial solution, the conversation stays open. Users can ask clarifying questions and the system maintains full session context so responses stay coherent. Sessions persist in a Unity Catalog Delta table so the conversation survives page refreshes and can be resumed later.
6. After rating the solution, the user can generate a training document. Once the problem is solved and the user rates the response, a "Generate Training Doc" button appears. They specify a format (Markdown, slide outline, or quick-start guide) and a target audience, and OCI/Claude turns the solution into a shareable knowledge artifact. This is where individual troubleshooting events compound into reusable organizational knowledge.
How the Knowledge Base Works
Each scenario is a JSON file that defines intake questions, answer options, help text, edge cases, and the conditions that trigger them. Here is a simplified example from the Payroll Element Entries scenario:
```json
{
"id": "oracle-hcm-payroll-element-entries",
"category": "Oracle HCM",
"title": "Payroll Element Entries Not Processing",
"intake_questions": [
{
"id": "effective_date",
"question": "What is the effective date of the entry?",
"type": "text",
"required": true,
"help_text": "Enter date in format MM/DD/YYYY - must be within pay period"
}
],
"edge_cases": [
{
"condition": "effective_date outside of pay_period",
"impact": "Entry will not process in expected period. Must match effective date to pay period or entry will be skipped.",
"severity": "high"
}
]
}
```
This structure means the expertise lives in the JSON, not in the prompt. Adding a new scenario or a new edge case is a file edit, not a code change. The knowledge base hot-reloads, so updates go live without a restart.
The Architecture on Databricks
Running this on Databricks was a deliberate choice. The team already uses the platform, so authentication, security, and data governance come for free. Here is the technical stack:
- Databricks Apps for hosting the Flask application
- Databricks Foundation Model API to call Claude (Sonnet 4.5 at time of writing) without managing any external API credentials or you can use the OCI LLM API and connect to it using Python from within the solution
- Unity Catalog Delta tables for two purposes: structured logging of every interaction to `ef_dv.raw_dev_metadata.solution_designer_logs`, and session persistence in `ef_dv.raw_dev_metadata.conversation_sessions`
- Serverless SQL Warehouse for querying those Delta tables from the app
- On-behalf-of (OBO) authentication so every request to Databricks runs under the identity of the actual user, not a shared service account
The logging design deserves a mention. Every time a user generates a solution, submits feedback, or asks a follow-up question, a structured record lands in the Delta table with fields for user identity, scenario ID, scenario category, satisfaction rating, time saved, duration, and the interaction text. That gives leadership a real ROI dashboard backed by queryable data rather than estimates.
How We Built This With Agentic Development
The development process itself is worth describing because it reflects a pattern that works well for Oracle-specific tooling.
We used GitHub Copilot in agent mode throughout the build. The agent did not just autocomplete code; it was given feature requirements and asked to reason through the implementation, identify edge cases in its own output, and iterate. The key discipline was keeping the agent's scope narrow. Each session had one focused objective: add session persistence, add the feedback loop, fix a specific datetime bug in Delta table reads, resolve an issue with the resume-session banner showing stale state.
This approach works well for Oracle support tooling for a specific reason: Oracle Cloud problems are well-bounded. The diagnostic questions for a 3-way match hold resolution are not the same as the diagnostic questions for an HCM Fast Formula error. That domain specificity is an asset when working with an AI coding agent because you can describe the scenario structure precisely and the agent can reason about it correctly.
The iterative cycle looked like this:
1. Identify a gap in the user experience (example: returning to a previously solved scenario still presented the rating form instead of the completed state).
2. Trace the root cause through the code. In this case, the `/feedback` endpoint stored the rating on the session's last interaction in the Delta table but intentionally left session status as `active`. The frontend never checked whether the last interaction already had feedback attached.
3. Write a targeted fix. The solution was a new `restoreFeedbackStateFromHistory()` function in the frontend that reads the persisted rating, locks the feedback buttons, skips the rating form, and surfaces the "Generate Training Doc" button directly. No backend changes needed.
4. Run the 329-test suite to verify no regressions.
5. Deploy with the PowerShell script that syncs files to the Databricks workspace and triggers a new app deployment.
6. Confirm SUCCEEDED in the Databricks Apps deployment status.
The agent accelerated every step but did not replace the domain judgment calls. Decisions like "should the session status flip to solved when feedback is submitted, or should we leave it active so users can keep the conversation going?" require understanding the Oracle support workflow. The agent implements whatever direction you give it; you still have to give the right direction.
What This Looks Like in Practice
Consider a real scenario: an Oracle Finance team is getting 3-way match holds on vendor invoices. Previously, a ticket would come in, a knowledgeable team member would spend 15 to 45 minutes asking diagnostic questions via email or a Teams chat, and then write up a resolution.
With Solution Designer, the user navigates to the "3-Way Match Hold Resolution in Payables" scenario. They answer questions about hold type (quantity vs. price variance), tolerance settings, receipt status, PO type, and approval status. The tool detects relevant edge cases -- for example, if it is a price variance hold on a blanket PO, there is a specific Oracle behavior around line-level vs. header-level tolerances that trips people up repeatedly. The solution it generates addresses exactly that combination. If the user is satisfied, they rate the response, click "Generate Training Doc," specify their audience as "Oracle Finance users," and download a formatted guide they can drop into their SharePoint or Confluence space.
The time saved compounds across every person who hits that scenario going forward.
What We Learned
A few things stand out after building this:
Structure beats prompting. The quality improvement from giving AI structured inputs with pre-evaluated edge cases is larger than any prompt engineering trick. When the model knows that the effective date is outside the pay period because the scenario definition explicitly flagged it, the response is categorically more useful than when the user tries to describe that situation in free text.
Persistence changes the support dynamic. When solutions survive as resumable sessions in Delta, the tool stops being a one-shot query tool and starts being a case management layer. Users come back to the same session when a problem resurfaces, add follow-up context, and build a richer record of what was tried and what worked.
Feedback closes the loop. The satisfaction ratings and time-saved estimates that land in the Delta log are not vanity metrics. They identify which scenarios are working and which ones need better edge case coverage. The scenarios with low success rates are the ones that need a knowledge base update. That is a tractable engineering task.
Domain expertise encodes well into JSON. Oracle Cloud has well-documented configuration patterns and well-known failure modes. That knowledge, once captured in the scenario JSON format, is reusable forever. An experienced Oracle consultant or admin can encode years of diagnostic heuristics in an afternoon and make them available to everyone on the team immediately.
Getting Started If You Want to Build Something Similar
The approach is not specific to this stack. The core pattern is: structured intake + knowledge-encoded edge cases + LLM synthesis + persistent sessions. You need:
- A way to call a capable LLM (Databricks Foundation Model API, OCI LLM, Azure OpenAI, or similar)
- A JSON or database structure to encode your domain knowledge
- A lightweight web framework (Flask works well; FastAPI is another good option)
- A persistence layer for sessions and logs (Delta on Databricks is convenient if you are already there; a Postgres database works just as well and of course OCI Autonomous Databases!)
If your organization runs Oracle Cloud and has people who have been troubleshooting it for years, you already have the hardest ingredient: the domain knowledge. The tool just gives you a way to make it available at scale.
The knowledge base for this project started with five scenarios based on the most frequent support tickets. It now covers 31. Each new scenario takes roughly two to four hours to build out properly, including the edge cases. That is a reasonable investment for a scenario that generates 20 support tickets a month.
If you are building something similar for your Oracle user community, the questions, edge case structures, and session management patterns described here translate directly. The specific scenarios are where your team's expertise goes -- everything else is scaffolding.
No comments:
Post a Comment