GSP-Agent-Windows/docs/GSP_WINDOWS_AGENT_PORT_VALIDATION.md

202 lines
7.7 KiB
Markdown

# GSP Windows Agent Port Validation
Workspace reference: [`GSP-WORKSPACE.md`](../../GSP-WORKSPACE.md)
## Current Behavior
The Windows agent is a Cygwin-packaged OGP Perl agent. The maintained runtime lives under `OGP64/`, and the core service is `OGP64/OGP/ogp_agent.pl`.
## Documentation Review Notes
Reviewed local project documentation:
- `README.md`
- `docs/AGENT_ARCHITECTURE.md`
- `docs/CYGWIN_INTEGRATION.md`
- `docs/COMMAND_EXECUTION.md`
- `docs/PROCESS_MANAGEMENT.md`
- `docs/PANEL_INTEGRATION.md`
- `OGP64/OGP/README.md`
- `OGP64/OGP/documentation/agent-guide.md`
- related GSP Panel docs under `../GSP/docs/architecture/` and `../GSP/docs/features/STATUS_SYSTEM.md`
Documentation not found in this repository:
- `AGENTS.md`
- `CODEX_PROJECT_GUIDE.md`
- a dedicated protocol/API document before this feature note
- a dedicated networking/status validation document before this feature note
## Architecture Discovery
Windows startup flow:
1. `OGP64/agent_start.bat` sets the Cygwin path and validates `OGP64/OGP/ogp_agent.pl`.
2. The batch script creates missing config files from `.default` templates.
3. It runs `perl -c ./ogp_agent.pl`.
4. It launches the agent from `/OGP`.
Shutdown flow:
1. `OGP64/agent_stop.bat` reads known PID files.
2. It sends termination signals through the bundled Cygwin `kill.exe`.
Communication with Panel:
- XML-RPC over `/RPC2`
- shared key configured in `OGP64/OGP/Cfg/Config.pm`
- command dispatch table in `OGP64/OGP/ogp_agent.pl`
- structured status command: `server_status`
Server launch process:
- Panel calls `universal_start`.
- Agent writes a startup hint under `/OGP/startups`.
- Agent launches the server inside a managed screen session.
- The game command receives Panel-generated startup parameters, including the assigned game port.
Server stop process:
- Panel calls `stop_server`.
- Agent writes a `STOPPING` status hint.
- Agent removes the startup flag and kills the managed process tree.
Server status process:
- Panel calls `server_status` when available.
- Agent checks the managed screen session.
- Agent validates Panel-provided ports.
- Agent returns a compatibility `status` plus richer port fields.
Before this change, the structured `server_status` RPC used these inputs:
- `home_id`
- `server_ip`
- `server_port`
- `query_port`
- `rcon_port`
- `startup_timeout`
- `state_hint`
The agent checked the managed GNU Screen session and probed ports with `netstat`. It primarily treated the game port as the readiness indicator and returned compatibility fields such as `status`, `ready`, `process_running`, `session_running`, `game_port_listening`, `query_port_listening`, and `rcon_port_listening`.
This was better than checking only a process, but it still had gaps:
- it did not expose a complete expected/listening/missing port list
- it only modeled game/query/RCON ports as individual checks
- it did not return the requested `Stopped`, `Starting`, `Running`, `Warning`, and `Failed` state model
- it preferred `netstat`, while Windows can expose listening ports through .NET networking APIs
## Proposed Behavior
The agent should validate only the ports assigned by the Panel for the specific server.
The Panel remains the source of truth for expected ports. The agent must not scan random application ports, guess ports, or hardcode game-specific port rules.
The status result now keeps the existing compatibility fields and adds richer fields:
- `ProcessRunning`
- `StatusState`
- `ExpectedPorts`
- `ListeningPorts`
- `MissingPorts`
- `CPUUsage`
- `MemoryUsage`
- `PortValidationEnabled`
- `StartupValidationTimeoutSeconds`
- `PortCheckIntervalSeconds`
## Architecture Overview
```text
Panel Game Monitor
-> Panel/includes/lib_remote.php remote_server_status()
-> XML-RPC server_status
-> OGP64/OGP/ogp_agent.pl
-> managed screen/session check
-> configured port validation
-> structured status hash
-> Panel display logic
```
## Status Flow
1. Panel calls `server_status` with the server's assigned game/query/RCON ports.
2. Agent checks the managed screen session for `home_id`.
3. Agent builds `ExpectedPorts` from the Panel-provided ports.
4. Agent collects listening ports using PowerShell/.NET `System.Net.NetworkInformation.IPGlobalProperties` when available.
5. Agent falls back to `netstat -an` if PowerShell/.NET collection fails.
6. Agent compares expected ports with active TCP/UDP listeners.
7. Agent returns old compatibility fields and new detailed fields.
## State Model
| `StatusState` | Meaning | Compatibility `status` |
| --- | --- | --- |
| `Stopped` | No managed process/session and no configured port evidence. | `OFFLINE` |
| `Starting` | Process/session exists, but required ports are not listening yet. | `STARTING` |
| `Running` | Process/session exists and all expected ports are listening. | `ONLINE` |
| `Warning` | Process/session exists and only some expected ports are listening, or ports listen without the managed session. | `ONLINE` |
| `Failed` | Process/session exists and no expected ports are listening after timeout. | `UNRESPONSIVE` |
The compatibility `status` field is intentionally preserved so existing Panel code does not break.
## Panel Integration
Current Panel integration is already compatible:
- `Panel/includes/lib_remote.php::remote_server_status()`
- `Panel/modules/gamemanager/home_handling_functions.php::get_agent_server_status()`
The Panel currently passes the assigned game port plus derived query and RCON ports. Future Panel work can pass multiple ports in the existing `query_port` or `rcon_port` strings using comma, semicolon, or whitespace-separated values, with optional protocol markers such as `2302/udp` or `tcp:27015`.
## Agent Integration
Changed agent file:
- `OGP64/OGP/ogp_agent.pl`
Changed default config file:
- `OGP64/OGP/Cfg/Preferences.pm.default`
The agent does not introduce a new RPC. It extends the existing `server_status` response.
## Configuration Options
Add these keys to `Cfg/Preferences.pm`:
| Key | Default | Purpose |
| --- | --- | --- |
| `PortValidationEnabled` | `1` | Enables configured port validation. |
| `StartupValidationTimeoutSeconds` | `180` | Time before a process with no listening required ports is treated as failed. |
| `PortCheckIntervalSeconds` | `5` | Polling interval for future startup wait loops. Current RPC checks once per call. |
## Testing Plan
Validation scenarios:
| Scenario | Expected result |
| --- | --- |
| Process/session running and all expected ports listening | `StatusState=Running`, `status=ONLINE` |
| Process/session running and some expected ports listening | `StatusState=Warning`, `status=ONLINE`, missing ports listed |
| Process/session running and no ports before timeout | `StatusState=Starting`, `status=STARTING` |
| Process/session running and no ports after timeout | `StatusState=Failed`, `status=UNRESPONSIVE` |
| No process/session and no expected ports listening | `StatusState=Stopped`, `status=OFFLINE` |
| No process/session but expected port is listening | `StatusState=Running` or `Warning`, `status=ONLINE`, warning message |
Manual Windows validation:
1. Start the agent with `C:\OGP64\agent_start.bat`.
2. Start a test server from the Panel.
3. Confirm `server_status` reports `Starting` until assigned ports bind.
4. Confirm all assigned ports appear under `ExpectedPorts`.
5. Confirm matching ports appear under `ListeningPorts`.
6. Confirm unbound assigned ports appear under `MissingPorts`.
## Future Enhancements
- Add Panel-side support for passing a first-class array of expected ports instead of overloading optional port strings.
- Add per-game startup timeout values from XML or Panel settings.
- Add process-specific CPU and memory usage when the game server PID tree can be mapped reliably.
- Add automated integration tests that call the XML-RPC endpoint on a Windows/Cygwin test host.