This page provides a visual companion to the in-depth article “ChatGPT Agents Architecture: Technical and Operational Analysis” (v2.3-beta), authored by JL de la Torre. The interactive diagram below outlines the six functional layers inferred from empirical usage, reverse engineering, and tool-level introspection.
Each layer is color-coded and includes traceable references to the full report. This resource is designed to support educators, developers, and AI auditors looking to understand the operational boundaries and decision-making architecture of ChatGPT Agents. Confidence levels are included for each element based on reproducible evidence.
ChatGPT Agents Architecture (I)
Metacontext, Core Engine and Tools (v2.3-beta EN adapted)
JL de la Torre | July 2025 | CC BY-NC-SA
🧠
Core
Core
🛠️
Tool
Tool
Yellow
Tool
Tool
Blue
Core
Core
High
Medium
Low
Confidence
Medium
Low
Confidence
References: (page) | confidence level
Layer 0: Metacontext and System
- {00a} Identity: OpenAI LLM (cutoff Oct 2023, current date provided)
- {00b} Meta-prompt version and feature flags (not exposed to user)
- {00c} Free/Plus/Enterprise tier distinction affects resources/features
- {00d} Partial awareness of version changes between releases
🧠 Layer 1: Reasoning Core
- {01} LLM optimized for chain-of-thought reasoning (p.7) High
- {02} Explicit meta-prompt, ReAct instruction (p.7) High
- {03} Thought → Action → Observation execution cycle (p.21) High
- {04} Narrates plan and post-execution analysis (p.8) High
- {05} Limited self-correction (single attempt only) (p.25) High
- {06} Stateless between sessions (ephemeral state) (p.17) High
- {07} Output clarity and formatting principles (p.23) Medium
- {08} Style rules: one chart at a time, no seaborn, no color by default (p.24) Medium
- {09} Explicit error and limit reporting (p.23) High
- {10} Architectural distinction: executes actions, not just advises (p.10) High
🛠️ Layer 2: Tool Orchestrator
- {11} Centralized integration of Python, Browser, DALL·E (p.8) High
- {12} Shared folder bridge
/home/oai/share(p.17) High - {13} Transfer by copy, not direct access (p.17) High
- {14} Sequential execution only, no concurrency (p.7) High
- {15} Edge case: potential corruption in binary transfers (p.17) Medium
- {16} Browser has no access to Python interpreter memory (p.17) High
- {17} Package installation blocked inside Python sandbox (p.9) High
- {18} Preinstalled libs: pandas, numpy, matplotlib (p.24) Medium
- {19} DALL·E callable only via LLM, not API directly (p.8) High
- {20} File processing limited by RAM expansion on load (p.15) High
- {21} No network calls from Python, only via Browser tool (p.9) High
- {22} Error propagation consistent across tools (p.8) High
- {23} Workaround: always validate integrity after file copy (p.17) High
ChatGPT Agents Architecture – (II)
Sandbox, Interface and Edge Cases (v2.3-beta EN adapted)
Green
Sandbox
Sandbox
Gray
Interface
Interface
Orange/Red
Edge/Risk
Edge/Risk
References: (page) | confidence level
🔒 Layer 3: Sandbox / Execution Environment
- {24} Firecracker/gVisor Micro-VM (inferred) (p.13) Medium
- {25} Effective RAM per session: ~7–8GB (total ~10GB) (p.14) High
- {26} Ephemeral state: destroyed after session/error (p.14) High
- {27} Complete network isolation (p.9) High
- {28} VM boot <125ms, overhead <5MB (p.9) High
- {29} Minimalist OS, shared folder at /home/oai/share (p.14) High
- {30} Session time limit: ~5 min wall-clock (p.16) High
- {31} Practical file limit: 100–150MB (p.15) High
- {32} MemoryError → sandbox degradation, restart required (p.14) High
- {33} Python GC limitations, potential memory leak in edge cases (p.16) Low
- {34} No access to disk outside the VM (p.9) High
- {35} Session cost = RAM × time + management overhead (p.9) High
- {36} Workaround: chunk large files to avoid MemoryError (p.17) High
🤝 Layer 4: Human-Agent Interface
- {37} Requests confirmation for costly/irreversible actions (p.10) High
- {38} Allows human intervention; waits for feedback (p.10) High
- {39} Presents reasoning before action (p.10) High
- {40} Fallback behavior: asks for clarification after unresolved error (p.25) High
- {41} Adapts narrative based on prompt context/role (p.23) Medium
- {42} Transparently reports limitations and errors (p.23) High
- {43} Micro-decision making based on user instructions (p.23) High
- {44} Displays steps and process, no hidden execution (p.8) High
- {45} Session is destroyed after inactivity (no persistent warning) (p.14) High
⛔ Layer 5: Edge Cases, Security and Economics
- {46} No direct integration with private/internal APIs (by design) (p.35) High
- {47} Strict limits: ~5 min timeout, ~8GB RAM (p.16) High
- {48} No persistent memory across sessions (p.17) High
- {49} Practical file limit (~100MB) ≠ official upload limit (512MB) (p.15) High
- {50} Edge: occasional binary↔text corruption in transfers (p.17) Medium
- {51} All execution validated; no autonomous decisions (p.23) High
- {52} VM destroyed post-use to reduce attack surface (p.14) High
- {53} Architectural trade-off: prioritizes cost & security over power (p.16) High
- {54} Differences in Enterprise/Plus: more RAM/time (p.43) Low
- {55} Ongoing evolution; feature flags not always documented (p.47) Medium
ChatGPT Agents Architecture – (III)
Observability, Traceability, and Metadata (v2.3-beta EN adapted)
Purple
Observability
Observability
🟩
Sandbox
Sandbox
🟦
Core
Core
🟨
Tool
Tool
Colors & symbols: see final legend | Confidence: High, Medium, Low
🔎 Layer 6: Observability and Traceability
- {56} No detailed logs exposed to the end user
- {57} Errors and limitations are reported via messages, not trace dumps
- {58} No external session auditing (for privacy/security reasons)
- {59} Architectural changes only inferable through behavioral shifts
- {60} Confidence levels assigned based on replicable evidence
Example: Item Metadata Table
| ID | Item/Summary | Page | Confidence | Note / Workaround |
|---|---|---|---|---|
| 24 | Firecracker Micro-VM | 13 | Medium | Inferred architecture, not officially confirmed |
| 25 | RAM: 7–8GB/session | 14 | High | Empirically validated |
| 32 | Sandbox degradation | 14 | High | Restart session after error |
For extended analysis: consider editable tables with extra columns (e.g., source, reproducibility…)
Legend & Reference Keys
- 🟦 Core (Reasoning engine, LLM, blue)
- 🟨 Tool (Python, Browser, DALL·E, amber)
- 🟩 Sandbox (Micro-VM, isolated environment, green)
- 🟫 Interface (Human collaboration layer, gray)
- 🟥 Edge/Limitation (Security, hard constraints, red/orange)
- 🟪 Observability (Logs, introspection, purple)
- High, Medium, Low: Confidence levels
Source: ChatGPT Agents Architecture, JL de la Torre, v2.3-beta
Arquitectura de los agents de ChatGPT: análisis técnico y operativo


