Search This Blog

Saturday, March 14, 2026

Converting Oracle’s 1,155 Page EPM REST API PDF Into Machine Readable JSON and Markdown and What It Unlocks

While working through several integration and automation use cases involving Oracle Enterprise Performance Management Cloud, our team kept running into the same practical challenge. The APIs themselves were capable, well designed, and supported a wide range of enterprise scenarios. The challenge was not the APIs. It was how quickly and precisely we could work with the documentation.

The official EPM REST API reference is a 1,155 page PDF.

That document is thorough and authoritative. It reflects significant investment from Oracle and provides deep coverage of the platform. As our use cases expanded to include automation, AI assisted development, and reusable integration patterns, we started asking a different question. What happens when that same high quality documentation is available in a form that tools can understand just as easily as people?

That question led us down an interesting path.

Why structure matters

When teams build against the EPM Planning REST API, the needs are usually very specific. Engineers need to know exactly which endpoint to call, which parameters are required, how asynchronous jobs behave, and how to interpret the response.

The information already exists in the documentation. The challenge is not accuracy or completeness. The challenge is that the information is optimized for reading, not for reuse by modern tooling or AI.

As development workflows increasingly involve AI coding assistants and automated pipelines, having clean structured context becomes essential. These tools perform best when they can reason over well defined inputs rather than scanning long blocks of unstructured text.

The goal here was not to critique how the documentation is published today. It was to explore how much additional value could be unlocked by reshaping it into formats that better align with how software is now being built.

What was built

As part of this effort, a Python script was created to extract and restructure the Oracle EPM REST API documentation. The script processes the official EPM REST API PDF, document G42939 02 from November 2025 on docs.oracle.com, extracts the full text using pdfplumber, and transforms it into structured outputs.

The first output is a structured JSON file. It captures authentication methods including Basic Authentication and OAuth 2.0 with precise header formats. It includes identity domain rules for test and production environments. It documents thirty four REST API endpoints across six functional areas with method, path, parameters, request details, response details, and required roles. It also includes twenty five supported job types with descriptions, along with status codes and other reference data that comes up frequently during development.

The second output is a Markdown version of the same information. This format is easier for humans to scan and review, while still preserving structure through headings, tables, and code blocks.

In addition, the full extracted text from the PDF is saved as a plain text file. This provides traceability back to the source material when deeper validation or cross checking is needed.

How the information is organized

The structured output is grouped by API domain.

Planning includes nineteen endpoints covering jobs, data slices, members, variables, substitution variables, applications, and Smart Push.

Migration includes seven endpoints covering snapshots, file operations, and service maintenance.

Security includes two endpoints focused on encryption key management.

User Management includes three endpoints for users, groups, and role assignments.

Reporting includes one endpoint for report extraction.

Data Management includes two endpoints for integration job execution.

Each endpoint follows a consistent and predictable structure. That consistency is what allows the documentation to move from something that must be read manually to something that can be consumed directly by tools.

Once the documentation exists in this form, it becomes much easier to integrate into development workflows rather than sitting on the side as a reference that is consulted only when something goes wrong.

What this enables

The most immediate benefit shows up when working with AI assisted development tools.

With a structured JSON reference available, coding assistants can reason about the EPM API surface area accurately. They can identify valid endpoints, understand required parameters, recognize supported job types, and generate correct request patterns without guesswork.

This changes the nature of development work. Teams spend less time validating endpoint paths and correcting assumptions, and more time focusing on orchestration, error handling, and business logic.

It also enables more advanced patterns. For teams experimenting with agents that interact with enterprise systems by running jobs, polling for completion, and exporting results, a machine readable API contract is a prerequisite. This approach provides that contract without changing the underlying APIs.

Nothing about the EPM APIs themselves changes. What changes is how easily they can be incorporated into modern tooling and automated workflows.

Notes on the extraction approach

The extraction process is straightforward in concept but careful in execution. pdfplumber handles the low level PDF text extraction. The more challenging part is identifying structure such as endpoint headers, HTTP methods, parameter sections, and response schemas.

Oracle’s documentation is internally consistent, which made this feasible. The script relies on that consistency along with targeted pattern matching and heuristics designed specifically around the EPM REST API layout.

This is not intended to be a generic solution for converting any PDF into structured data. It is a practical pipeline built to produce reliable and usable output for a specific enterprise API.

One interesting outcome is the level of compression involved. The source document is 1,155 pages long. The structured JSON representation is under one thousand lines. That difference highlights how much of a traditional API PDF consists of layout, repetition, and presentation rather than core technical meaning.

Looking ahead

As software development continues to evolve toward closer collaboration between humans and machines, documentation becomes even more important. The format simply needs to keep pace with how it is used.

JSON and MD represent an ideal state for AI interoperability. For platforms that still rely primarily on PDF or other types of documentation, there is an opportunity to expand how that information can be consumed and reused without replacing what already exists.

For teams working with complex enterprise APIs, the takeaway is straightforward. If the documentation exists, it can be made more useful. Extracting it, structuring it, and integrating it into development workflows can unlock real gains in speed, accuracy, and confidence.

That is what this effort set out to explore, and the results have already shown clear value in practice.

Get the code

The extraction script, along with the structured JSON output and the Markdown reference, is available in the GitHub repository here:

https://github.com/TheOwner-glitch/oracle_epm_rest_api/tree/main

The repository includes the Python script used to extract and structure the documentation, the machine readable JSON reference, and a Markdown version that is easier to browse and review. Together, these files provide a practical starting point for teams working with Oracle EPM REST APIs who want documentation that can be consumed directly by modern development tools.

The broader repository also includes supporting artifacts related to interacting with Oracle EPM Cloud, but the API reference extraction stands on its own as a useful and reusable asset.

If this approach proves useful or is adapted for other platforms or documentation sets, those lessons are worth sharing.

Monday, July 7, 2025

Introducing My Microsoft Copilot Agent for Oracle ERP & HCM Metadata

Earlier, I shared a blog post detailing a Python-based CLI tool that leveraged Oracle Cloud HCM metadata and generative AI to provide SQL generation, metadata explanation, and table join suggestions. Building on that foundation, I’ve now brought the experience directly into Microsoft 365 using a Copilot Agent integrated with SharePoint.

This new solution allows users to interact naturally with metadata from Oracle ERP and HCM—without ever leaving the Microsoft ecosystem.


What This Copilot Agent Does

This Copilot Agent acts as a metadata consultant within your Microsoft 365 environment. It enables:

  • Natural language discovery of relevant Oracle Cloud tables and columns
  • Contextual SQL generation based on business terms
  • Join recommendations using known key fields like person_id, assignment_id, etc.
  • Explanations of tables, columns, and relationships
  • Starter query generation for BI Publisher reports


Why CSV Metadata Format?

During development, I found that Microsoft Copilot currently does not support JSON-based data sources for grounding

As a result, I converted my metadata files to CSV format to ensure compatibility.

This included structured metadata for tables and columns sourced from both Oracle ERP and HCM Cloud.


Copilot Agent Instructions

Agent Purpose:

You are an intelligent enterprise metadata consultant designed to assist Oracle ERP and HCM users.
You use structured metadata stored in SharePoint to help users explore, understand, and query Oracle Cloud Applications datasets.


Behavioral Instructions (Copilot Agent):

Understand the Metadata:
Use metadata stored in the provided SharePoint folder:

ERP and HCM Tables Metadata

  • Table_Metadata.csv: Contains table-level descriptions and possible usage context.
  • Columns_Metadata.csv Contain schema-level information, including table name, column name, and column descriptions.

First load the Table_Metadata.csv, then load the Columns_Metadata files.
Use the shared table_Id to join columns to their corresponding tables.


Tasks You Can Perform:

  • Suggest which tables or columns are most relevant to a user's query
  • Generate optimized SQL queries based on natural language prompts
  • Recommend joins using shared fields such as person_id, assignment_id, etc.
  • Explain what a specific table or column is used for in business terms
  • Summarize metadata for one or more objects when asked to “explain” or “describe”
  • Help build starter queries for Oracle BI Publisher (BIP) reports
  • Support semantic search (e.g., a search for "payroll balances" should find related metadata even if it’s not an exact match)
  • Act as an expert Oracle Cloud ERP and HCM analyst and developer, capable of solving advanced metadata questions and building queries based on complex requirements


How to Complete the Tasks:

  • Always refer to metadata found in the provided SharePoint files
  • Never fabricate or guess metadata
  • If no matching result is found, say: “I could not find relevant metadata for your request based on the provided files.”
  • If the user provides multiple keywords (e.g., “payroll, salary”), treat them as individual context terms
  • Scan for matches across both table names and descriptions and column descriptions
  • Use exact column name and table name matches where possible
  • Suggest joins using shared fields such as assignment_id, person_id, location_id
  • Prefer documented relationships where available
  • When generating SQL, use clear formatting and include comments if needed
  • When a user says “HCM only” or “Exclude ERP,” make sure results match
  • Clearly state which app (ERP or HCM) an object is part of when helpful
  • When asked to return specific fields (e.g., “name, email, location”), find which columns correspond to those descriptions and which tables they belong to
  • Ensure final responses are concise, technical, and clearly grounded in real metadata


Known Limitations

While this Copilot Agent adds tremendous value, it still has some important limitations:

  • It can hallucinate: If metadata isn’t found due to vague prompts, the agent may fabricate plausible-sounding but incorrect information
  • It requires clear prompting: Users get the best results when they use specific, well-structured queries
  • File linking is not perfect: Even though metadata is grounded in CSV files, deep linking between them can still pose a challenge

Despite these caveats, the Copilot Agent demonstrates how far we can go by bringing structured enterprise metadata and AI together inside the tools we use every day.

Sunday, July 6, 2025

AI Enhances Oracle Cloud HCM Support: Elevating Insight with the Oracle HCM Intel CLI

Oracle Gen AI + Public HCM Metadata = Limitless Insight

The Oracle HCM Intel CLI Tool is a professional-grade command-line assistant that uses your extracted HCM metadata and combines it with the capabilities of Oracle’s OCI Generative AI service (powered by Cohere) or OpenAI, or the LLM of your choice, giving you intelligent insights relative to HCM data structures. And it's all done securely and locally—no sensitive data leaves your environment.

By referencing public HCM data definitions, this project demonstrates just how powerfully Oracle Gen AI, and other AI offerings, can be applied in the enterprise to support analysts, developers, and architects with tasks such as:

  • Generating SQL queries based on natural language
  • Suggesting joins between key Oracle HCM tables
  • Explaining and optimizing existing SQL
  • Creating BI Publisher-ready templates
  • Performing semantic metadata searches

Project GitHuboracle_hcm_intel_cli
DemoWatch Demo

Built With Practical Enterprise Needs in Mind

The tool includes features that any real-world implementation would benefit from:

  • Encrypted .env handling using runtime secrets
  • Markdown output for reporting or Copilot/Teams usage
  • Audit mode for logging what metadata was passed to the LLM
  • Modular provider support for OpenAI or Oracle Gen AI
  • Interactive prompt chaining for analysts and non-developers

It’s optimized for real users working in real environments—offering flexibility without sacrificing security or precision.


Powered by Oracle Technology, Honoring Oracle’s Vision

This CLI tool is powered by Oracle’s own public documentation and showcases the capabilities of Oracle Cloud Infrastructure’s Generative AI platform. It's a clear example of how AI and metadata can be responsibly applied in the enterprise, especially when building tooling around Oracle’s HCM ecosystem.

Oracle’s Gen AI service is the star of the show here—it brings context, comprehension, and creative query generation to the hands of business users and technical professionals alike.

Earlier this month, I introduced a Metadata Extractor CLI Tool built to programmatically parse Oracle Cloud HCM’s public documentation and extract table and view metadata into JSON format. That foundational tool—available on GitHub at oracle_hcm_metadata_extractor—has enabled this powerful second act: the Oracle HCM Intel CLI.


Get Started

  1. Extract metadata using: oracle_hcm_metadata_extractor

  2. Query and explore with: oracle_hcm_intel_cli

Whether you’re working in HCM data architecture, reporting, or support—this is your AI-powered sidekick.


I hope this project inspires others to build upon Oracle’s cloud platform and apply Generative AI responsibly. The future of enterprise tooling is intelligent, secure, and deeply integrated—and with Oracle Gen AI, that future is now.


Julio @ OracleSpot.net