ChatGPT Agents Architecture: Technical & Operational Analysis (v2.3-beta)

This page provides a visual companion to the in-depth article “ChatGPT Agents Architecture: Technical and Operational Analysis” (v2.3-beta), authored by JL de la Torre. The interactive diagram below outlines the six functional layers inferred from empirical usage, reverse engineering, and tool-level introspection.

Each layer is color-coded and includes traceable references to the full report. This resource is designed to support educators, developers, and AI auditors looking to understand the operational boundaries and decision-making architecture of ChatGPT Agents. Confidence levels are included for each element based on reproducible evidence.

ChatGPT Agents Architecture (I)

Metacontext, Core Engine and Tools (v2.3-beta EN adapted)

JL de la Torre | July 2025 | CC BY-NC-SA

🧠
Core

🛠️
Tool

Yellow
Tool

Blue
Core

High
Medium
Low
Confidence

References: (page) | confidence level

Layer 0: Metacontext and System

{00a} Identity: OpenAI LLM (cutoff Oct 2023, current date provided)
{00b} Meta-prompt version and feature flags (not exposed to user)
{00c} Free/Plus/Enterprise tier distinction affects resources/features
{00d} Partial awareness of version changes between releases

🧠 Layer 1: Reasoning Core

{01} LLM optimized for chain-of-thought reasoning (p.7) High
{02} Explicit meta-prompt, ReAct instruction (p.7) High
{03} Thought → Action → Observation execution cycle (p.21) High
{04} Narrates plan and post-execution analysis (p.8) High
{05} Limited self-correction (single attempt only) (p.25) High
{06} Stateless between sessions (ephemeral state) (p.17) High
{07} Output clarity and formatting principles (p.23) Medium
{08} Style rules: one chart at a time, no seaborn, no color by default (p.24) Medium
{09} Explicit error and limit reporting (p.23) High
{10} Architectural distinction: executes actions, not just advises (p.10) High

🛠️ Layer 2: Tool Orchestrator

{11} Centralized integration of Python, Browser, DALL·E (p.8) High
{12} Shared folder bridge /home/oai/share (p.17) High
{13} Transfer by copy, not direct access (p.17) High
{14} Sequential execution only, no concurrency (p.7) High
{15} Edge case: potential corruption in binary transfers (p.17) Medium
{16} Browser has no access to Python interpreter memory (p.17) High
{17} Package installation blocked inside Python sandbox (p.9) High
{18} Preinstalled libs: pandas, numpy, matplotlib (p.24) Medium
{19} DALL·E callable only via LLM, not API directly (p.8) High
{20} File processing limited by RAM expansion on load (p.15) High
{21} No network calls from Python, only via Browser tool (p.9) High
{22} Error propagation consistent across tools (p.8) High
{23} Workaround: always validate integrity after file copy (p.17) High

ChatGPT Agents Architecture – (II)

Sandbox, Interface and Edge Cases (v2.3-beta EN adapted)

Green
Sandbox

Gray
Interface

Orange/Red
Edge/Risk

References: (page) | confidence level

🔒 Layer 3: Sandbox / Execution Environment

{24} Firecracker/gVisor Micro-VM (inferred) (p.13) Medium
{25} Effective RAM per session: ~7–8GB (total ~10GB) (p.14) High
{26} Ephemeral state: destroyed after session/error (p.14) High
{27} Complete network isolation (p.9) High
{28} VM boot <125ms, overhead <5MB (p.9) High
{29} Minimalist OS, shared folder at /home/oai/share (p.14) High
{30} Session time limit: ~5 min wall-clock (p.16) High
{31} Practical file limit: 100–150MB (p.15) High
{32} MemoryError → sandbox degradation, restart required (p.14) High
{33} Python GC limitations, potential memory leak in edge cases (p.16) Low
{34} No access to disk outside the VM (p.9) High
{35} Session cost = RAM × time + management overhead (p.9) High
{36} Workaround: chunk large files to avoid MemoryError (p.17) High

🤝 Layer 4: Human-Agent Interface

{37} Requests confirmation for costly/irreversible actions (p.10) High
{38} Allows human intervention; waits for feedback (p.10) High
{39} Presents reasoning before action (p.10) High
{40} Fallback behavior: asks for clarification after unresolved error (p.25) High
{41} Adapts narrative based on prompt context/role (p.23) Medium
{42} Transparently reports limitations and errors (p.23) High
{43} Micro-decision making based on user instructions (p.23) High
{44} Displays steps and process, no hidden execution (p.8) High
{45} Session is destroyed after inactivity (no persistent warning) (p.14) High

⛔ Layer 5: Edge Cases, Security and Economics

{46} No direct integration with private/internal APIs (by design) (p.35) High
{47} Strict limits: ~5 min timeout, ~8GB RAM (p.16) High
{48} No persistent memory across sessions (p.17) High
{49} Practical file limit (~100MB) ≠ official upload limit (512MB) (p.15) High
{50} Edge: occasional binary↔text corruption in transfers (p.17) Medium
{51} All execution validated; no autonomous decisions (p.23) High
{52} VM destroyed post-use to reduce attack surface (p.14) High
{53} Architectural trade-off: prioritizes cost & security over power (p.16) High
{54} Differences in Enterprise/Plus: more RAM/time (p.43) Low
{55} Ongoing evolution; feature flags not always documented (p.47) Medium

ChatGPT Agents Architecture – (III)

Observability, Traceability, and Metadata (v2.3-beta EN adapted)

Purple
Observability

🟩
Sandbox

🟦
Core

🟨
Tool

Colors & symbols: see final legend | Confidence: High, Medium, Low

🔎 Layer 6: Observability and Traceability

{56} No detailed logs exposed to the end user
{57} Errors and limitations are reported via messages, not trace dumps
{58} No external session auditing (for privacy/security reasons)
{59} Architectural changes only inferable through behavioral shifts
{60} Confidence levels assigned based on replicable evidence

Example: Item Metadata Table

ID	Item/Summary	Page	Confidence	Note / Workaround
24	Firecracker Micro-VM	13	Medium	Inferred architecture, not officially confirmed
25	RAM: 7–8GB/session	14	High	Empirically validated
32	Sandbox degradation	14	High	Restart session after error

For extended analysis: consider editable tables with extra columns (e.g., source, reproducibility…)

Legend & Reference Keys

🟦 Core (Reasoning engine, LLM, blue)
🟨 Tool (Python, Browser, DALL·E, amber)
🟩 Sandbox (Micro-VM, isolated environment, green)
🟫 Interface (Human collaboration layer, gray)
🟥 Edge/Limitation (Security, hard constraints, red/orange)
🟪 Observability (Logs, introspection, purple)
High, Medium, Low: Confidence levels

Source: ChatGPT Agents Architecture, JL de la Torre, v2.3-beta

Arquitectura de los agents de ChatGPT: análisis técnico y operativo