Search This Blog

Sunday, July 6, 2025

Supercharging Oracle HCM Research and Support with a CLI Tool to Obtain Structured Documentation Metadata

As an Oracle Cloud HCM customer and practitioner, I frequently rely on the Oracle Cloud Applications Documentation to understand how various tables and views are structured. Whether I’m building BI Publisher queries, supporting Fast Formulas, or troubleshooting integrations, knowing the exact column names and their meanings is essential.

But manually digging through dozens (or hundreds) of HTML pages is time-consuming and repetitive.

So, I decided to automate it in a way that would allow me to have conversations with this data versus clicking through hundreds of pages aimlessly looking for answers.

๐Ÿ“Œ What I Built

I created a Python-based command-line tool that extracts Oracle's publicly available HCM documentation, regarding tables and views, and outputs structured metadata for tables and views into clean JSON files. This helps streamline research, improve automation, and accelerate troubleshooting for internal support.

Here’s what it does:

  • Parses the toc.js file from Oracle’s documentation site to extract all table/view URLs

  • Visits each documentation page using a headless browser

  • Extracts metadata such as:

    • For Tables: table name, description, details, columns (with data types), primary keys, and indexes

    • For Views: view name, description, details, column names, and SQL query

No more flipping through documentation pages—just clean, structured data ready for analysis or integration into support tools, with many useful use cases including those in the AI landscape.


๐Ÿ› ️ How It Works

Step 1: Extract Links from toc.js

The Oracle documentation includes a toc.js file which holds all the links to individual table and view pages. The tool first parses that file to build a list of URLs.

python oracle_hcm_cli_tool.py --toc toc.js --csv oracle_links.csv

Step 2: Convert CSV to JSON

For downstream processing, the CSV is converted to a simple JSON array of {name, url} pairs.

python oracle_hcm_cli_tool.py --csv oracle_links.csv --json oracle_links.json

Step 3: Extract Table and View Metadata

The final step launches a headless browser with Playwright and visits each page to extract structured metadata.

python oracle_hcm_cli_tool.py --json oracle_links.json --tables 
oracle_tables.json --views oracle_views.json

Or all three steps in one:

python oracle_hcm_cli_tool.py --toc toc.js --csv oracle_links.csv 
--json oracle_links.json --tables tables.json --views views.json

๐Ÿ“ฆ Output Format

Here’s what the output looks like.

For Tables (tables.json)

{
  "table_name": "PER_ALL_PEOPLE_F",
  "url": "...",
  "description": "This table stores information about...",
  "details": "Schema: FUSION Object owner: PER Object type: TABLE Tablespace: APPS_TS_TX_DATA",
  "columns": [
    {
      "column_name": "PERSON_ID",
      "data_type": "NUMBER",
      "length": "",
      "precision": "18",
      "not_null": true,
      "description": "System generated surrogate key"
    },
    ...
  ],
  "primary_key": {
    "name": "PER_PEOPLE_F_PK",
    "columns": ["PERSON_ID", "EFFECTIVE_START_DATE", "EFFECTIVE_END_DATE"]
  },
  "indexes": [
    {
      "name": "PER_PEOPLE_F_U1",
      "uniqueness": "Unique",
      "columns": ["BUSINESS_GROUP_ID", "PERSON_NUMBER"]
    }
  ]
}

For Views (views.json)

{
  "view_name": "FAI_DEEP_LINKS_VL",
  "url": "...",
  "description": "This view shows deep link metadata...",
  "details": "Schema: FUSION Object owner: FAI Object type: VIEW",
  "columns": [
    { "column_name": "DEEP_LINK_ID" },
    { "column_name": "DEEP_LINK_CODE" },
    ...
  ],
  "sql_query": "SELECT ... FROM FAI_DEEP_LINKS_B b ..."
}

๐Ÿ’ก Why This Is Useful

This tool is a game-changer if you:

  • Frequently need to understand Oracle's schema to troubleshoot or write custom reports

  • Work in integrations, BI, or payroll support and need faster insights

  • Want to build dashboards or internal tools that visualize schema metadata

  • Need a starting point for building generative AI agents or search interfaces that utilize documentation in order to drive intelligent insights

  • This same approach can be followed for Oracle ERP, and other use cases, for Oracle customers


⚠️ Disclaimer

Disclaimer: 

This tool and blog post are independently created for research, education, and internal productivity enhancement by an Oracle customer. 

All references to Oracle® products and documentation are made strictly for educational and interoperability purposes. 

Oracle, Java, and MySQL are registered trademarks of Oracle and/or its affiliates. 

This tool is not affiliated with, endorsed by, or sponsored by Oracle Corporation.

Metadata shown (such as table names and column names) are extracted from publicly accessible Oracle documentation. 

No proprietary documentation or licensed software is redistributed or reproduced in full.

Always refer to Oracle Cloud Documentation for official and up-to-date content.


๐Ÿงช Want to Try It?

You can clone the tool and try it for yourself:

https://github.com/TheOwner-glitch/oracle_hcm_metadata_extractor

Happy automating!

No comments:

Post a Comment