VBAF.RL.QTable.ps1
|
#Requires -Version 5.1 <# .SYNOPSIS Q-Table -- hashtable storage for Q-Learning values .DESCRIPTION Maps (state, action) pairs to Q-values and implements the Bellman update rule directly. WHAT YOU ARE LEARNING HERE: ============================ The Q-Table is the memory of a Q-Learning agent. It stores one number for every (state, action) pair the agent has ever visited -- the estimated total future reward of taking that action in that state. HOW THE Q-TABLE WORKS: ====================== Think of the Q-Table as a two-dimensional lookup table: Action0 Action1 Action2 State "A" 2.3 0.8 -1.2 State "B" 0.0 3.1 1.5 State "C" -0.5 0.2 4.8 To choose an action in state "B": Look up row "B": [0.0, 3.1, 1.5] Best value is 3.1 at Action1 Agent chooses Action1 This is "exploitation" -- using learned knowledge. During exploration, the agent ignores the table and picks randomly. HASHTABLE IMPLEMENTATION: ========================= We cannot use a real 2D array because: - States are strings (not integers) - We do not know all states in advance - Most state-action pairs are never visited Solution: use a hashtable with "state|action" as the key. Key: "TowerA|Gothic" -> Value: 2.3 Key: "TowerA|Palace" -> Value: -0.5 This is called a SPARSE representation -- only visited pairs stored. Much more memory efficient than a full 2D array for large state spaces. DEFAULT VALUE: ============== When a state-action pair has never been visited, we return DefaultValue. DefaultValue = 0.0 is the standard (neutral starting point). DefaultValue > 0 = OPTIMISTIC initialisation (encourages exploration). DefaultValue < 0 = PESSIMISTIC initialisation (discourages unvisited states). THE BELLMAN UPDATE (built into QTable.Update()): ================================================ Q(s,a) <- Q(s,a) + alpha * [r + gamma * max Q(s',a') - Q(s,a)] This is the same formula used in QLearningAgent.Learn(). QTable encapsulates the formula so you do not have to repeat it. STATISTICS (AccessCount, UpdateCount): ======================================= These counters help diagnose learning: - AccessCount grows fast early (agent exploring many states) - UpdateCount shows how many Q-values have been refined - High Access but low Update = agent not learning (bug) - TotalEntries shows how much of the state space was visited EXPORT/IMPORT: ============== ExportTable() and ImportTable() allow saving learned knowledge. Export to JSON, save to disk, reload in next session. The agent "remembers" what it learned across sessions. .NOTES Part of VBAF (Visual AI & Reinforcement Learning Framework) Educational use -- inspect the Table hashtable to see what was learned. Used by: VBAF.RL.QLearningAgent.ps1 #> class QTable { [hashtable]$Table # The actual Q-value storage: "state|action" -> double [double]$DefaultValue # Value returned for unseen state-action pairs [int]$AccessCount # Total number of Q-value lookups (diagnostic) [int]$UpdateCount # Total number of Q-value updates (diagnostic) # Constructor with custom default value # Use DefaultValue > 0 for optimistic initialisation # (encourages agent to try every action at least once) QTable([double]$defaultValue) { $this.Table = @{} $this.DefaultValue = $defaultValue $this.AccessCount = 0 $this.UpdateCount = 0 } # Default constructor -- Q-values start at 0.0 (neutral) QTable() { $this.Table = @{} $this.DefaultValue = 0.0 $this.AccessCount = 0 $this.UpdateCount = 0 } # Create a unique key for each (state, action) pair. # The pipe character | is the separator -- states and actions # should not contain | to avoid key collisions. hidden [string] MakeKey([string]$state, [string]$action) { return "$state|$action" } # Look up Q(state, action). # Returns DefaultValue if this pair has never been seen. # Increments AccessCount for diagnostics. [double] Get([string]$state, [string]$action) { $key = $this.MakeKey($state, $action) $this.AccessCount++ if ($this.Table.ContainsKey($key)) { return $this.Table[$key] } else { return $this.DefaultValue # Unseen pair -- return neutral value } } # Store Q(state, action) = value. # Creates a new entry or overwrites an existing one. [void] Set([string]$state, [string]$action, [double]$value) { $key = $this.MakeKey($state, $action) $this.Table[$key] = $value $this.UpdateCount++ } # THE BELLMAN UPDATE -- the core of Q-Learning. # # Q(s,a) <- Q(s,a) + alpha * [r + gamma * max Q(s',a') - Q(s,a)] # # Parameters: # state -- current state s # action -- action taken a # reward -- reward received r # nextState -- resulting state s' # possibleActions -- all actions available in s' (to find max Q(s',a')) # alpha -- learning rate (how much to update) # gamma -- discount factor (how much to value future rewards) # # Step by step: # 1. Look up current estimate: Q(s,a) # 2. Find best future value: max Q(s',a') over all actions # 3. Compute Bellman target: r + gamma * max Q(s',a') # 4. Compute TD error: target - current estimate # 5. Update: new Q = old Q + alpha * TD error [void] Update([string]$state, [string]$action, [double]$reward, [string]$nextState, [string[]]$possibleActions, [double]$alpha, [double]$gamma) { $currentQ = $this.Get($state, $action) # Find max Q-value over all actions in the next state $maxNextQ = $this.DefaultValue if ($possibleActions.Count -gt 0) { foreach ($nextAction in $possibleActions) { $nextQ = $this.Get($nextState, $nextAction) if ($nextQ -gt $maxNextQ) { $maxNextQ = $nextQ } } } # Bellman update: move Q(s,a) toward the target $newQ = $currentQ + $alpha * ($reward + $gamma * $maxNextQ - $currentQ) $this.Set($state, $action, $newQ) } # Return the action with the highest Q-value in this state. # This is the GREEDY action -- what the agent believes is best. # Used during exploitation (when epsilon-greedy picks the best action). [string] GetBestAction([string]$state, [string[]]$possibleActions) { if ($possibleActions.Count -eq 0) { throw "No possible actions provided to GetBestAction" } $bestAction = $possibleActions[0] $bestQ = $this.Get($state, $bestAction) for ($i = 1; $i -lt $possibleActions.Count; $i++) { $action = $possibleActions[$i] $q = $this.Get($state, $action) if ($q -gt $bestQ) { $bestQ = $q $bestAction = $action } } return $bestAction } # Return Q-values for ALL actions in this state. # Useful for printing what the agent learned about a specific state. # Example: $table.GetStateValues("TowerA", $actions) # -> @{ "Gothic" = 2.3; "Palace" = -0.5; "Ruins" = 1.1 } [hashtable] GetStateValues([string]$state, [string[]]$possibleActions) { $values = @{} foreach ($action in $possibleActions) { $values[$action] = $this.Get($state, $action) } return $values } # Export the entire Q-table to a hashtable for saving. # Use ConvertTo-Json and Set-Content to save to disk. # Example: # $table.ExportTable() | ConvertTo-Json | Set-Content "qtable.json" [hashtable] ExportTable() { return @{ Table = $this.Table DefaultValue = $this.DefaultValue AccessCount = $this.AccessCount UpdateCount = $this.UpdateCount } } # Restore a previously saved Q-table. # The agent picks up exactly where it left off. [void] ImportTable([hashtable]$data) { $this.Table = $data.Table $this.DefaultValue = $data.DefaultValue $this.AccessCount = $data.AccessCount $this.UpdateCount = $data.UpdateCount } # Return diagnostic statistics. # TotalEntries: how many unique (state, action) pairs were visited # AccessCount: how many times Q-values were looked up # UpdateCount: how many times Q-values were changed [hashtable] GetStats() { return @{ TotalEntries = $this.Table.Count AccessCount = $this.AccessCount UpdateCount = $this.UpdateCount DefaultValue = $this.DefaultValue } } # Wipe all learned values -- start fresh. # AccessCount and UpdateCount also reset. [void] Reset() { $this.Table.Clear() $this.AccessCount = 0 $this.UpdateCount = 0 } } # ============================================================================ # QUICK REFERENCE # ============================================================================ # # CREATE A Q-TABLE: # $table = [QTable]::new() # default value 0.0 # $table = [QTable]::new(1.0) # optimistic default (encourages exploration) # # READ AND WRITE: # $value = $table.Get("StateA", "ActionLeft") # $table.Set("StateA", "ActionLeft", 2.5) # # ONE-STEP UPDATE (Bellman equation built in): # $table.Update("StateA", "ActionLeft", 1.0, "StateB", $actions, 0.1, 0.9) # # FIND BEST ACTION: # $best = $table.GetBestAction("StateA", $actions) # # INSPECT WHAT WAS LEARNED: # $table.GetStateValues("StateA", $actions) # $table.GetStats() # $table.Table # raw hashtable -- all keys and values # # SAVE AND LOAD: # $table.ExportTable() | ConvertTo-Json | Set-Content "qtable.json" # $data = Get-Content "qtable.json" | ConvertFrom-Json # $table.ImportTable($data) # # SEE ALSO: # VBAF.RL.QLearningAgent.ps1 -- uses QTable internally # ============================================================================ |