### Tic Tac Toe¶

#### Goals¶

With this, my main goal was to mess around with both using straight numpy to build a basic tic tac toe agent as well as to learn more about qlearning as well. This article helped me with getting started with the basics.

This was about a day long project to help get started and next steps are to make this user friendly and deploy this outside of a notebook environment! Cause let's be honest, these aren't super user friendly.

```
from copy import deepcopy
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
```

#### Agent Class¶

the main event of all this! each agent is set up to track moves taken over the course of the game and at the end, use `reward`

to update the states throughout the game and update the reward function for that game based on the outcome.

Using Bellman Equation:

$\epsilon$: exploration rate - chance that a random move is taken vs best move.

$\alpha$: learning rate - basically weights everything and how much impact each reward function has.

$\gamma$: discount factor - acting to weight the impact the future reward has.

or in code:

`self.states[move] += self.lr * (self.discount_factor * reward - self.states[move])`

which can definitely be easier to read :)

```
class Agent:
def __init__(self, symbol, lr=0.2, exp_rate=0.4, discount_factor=0.1):
"""
initialize a player and associated symbol.
learning rate,
exploration rate,
discount factor
"""
self.symbol = symbol
self.lr = lr
self.exp_rate = exp_rate
self.discount_factor = discount_factor
self.states = {}
self.moves_taken = []
def _get_values(self, hash):
"""
"""
if hash not in self.states:
values = 0
else:
values = self.states[hash]
return values
def move(self, board):
"""
choose next move
"""
explore = np.random.uniform(0, 1) < self.exp_rate
avail_idx = np.argwhere(board.board == 0)
next_move = None
if explore: # look through all available positions and find best move
max_value = -999
for pos in avail_idx:
next_board = deepcopy(board)
next_board.update_board(pos[0], pos[1], self.symbol)
board_hash = next_board.get_hash()
value = self._get_values(board_hash)
if value > max_value:
max_value = value
next_move = pos
else:
pos_idx = np.random.randint(len(avail_idx))
next_move = avail_idx[pos_idx]
return next_move
def update_move_history(self, board_hash):
"""
add move to front of the list.
"""
self.moves_taken.insert(0, board_hash)
def reward(self, reward):
"""
update rewards at end of game for each state.
Q(S,A)= Q(S,A)+α∗(γ∗maxaQ(S′,a)− Q(S,A))
"""
# move history is reversed, reward is reward for the next move taken
for move in self.moves_taken:
if move not in self.states:
self.states[move] = 0
self.states[move] += self.lr * (self.discount_factor * reward - self.states[move])
reward = self.states[move]
def reset(self):
"""
"""
self.moves_taken = []
```

#### Player class¶

A super basic class to hold the players moves and check for a win.

```
class Player:
def __init__(self, symbol):
self.symbol = symbol
def make_move(self, row, col, board):
board.update_board(row, col, self.symbol)
board.str_rep()
board.check_win(3)
```

#### Gameboard¶

You can't play tic-tac-toe without a gameboard! And we definitely want to make sure the board has some basic representation so you don't have to keep track of a numpy array while playing.

Some basic functions we want:

- make the board! and track player symbols
- track what moves each players makes and update that symbol appropriately.
- get gameboard hash. the way the agent tracks what moves it has made is by storing a flattened, string representation of the gameboard at that time. we'll have the gameboard class itself take care of this.
- who won? after every move we want to check if anyone won or if there was a tie. the agent won't really learn anything if it never knows what the outcome was.

```
class Gameboard:
def __init__(self, n_rows=3, n_cols=3):
self.n_rows = n_rows
self.n_cols = n_cols
self.board = np.zeros((self.n_rows, self.n_cols))
self.symbols = {0: " ", 1: "X", -1: "O"}
self.symbols_rev = {" ": 0, "X": 1, "O": -1}
self.winner = None
self.draw = None
self.hash = None
def str_rep(self):
"""
string representation of array, with symbols
"""
for i in range(self.n_rows):
row = "|"
header = "|"
for j in range(self.n_cols):
header += "---|"
row += f" {self.symbols[self.board[i, j]]} |"
print(header)
print(row)
print(header)
def get_hash(self):
"""
basically, return flattened view of board
"""
self.hash = str(self.board.flatten())
return self.hash
def update_board(self, row, col, sym):
"""
update array with int representation of symbol if position is valid.
"""
if not isinstance(sym, int):
sym = self.symbols_rev[sym]
if self.board[row, col] == 0:
self.board[row, col] = sym
else:
print("move not valid, space not empty")
def check_win(self, win_score):
"""
check diagonals, rows, and columns for win_score (+ or -) or draw.
"""
left_diag_score = np.sum(np.diag(self.board))
right_diag_score = np.sum(np.diag(np.fliplr(self.board)))
row_sums = np.sum(self.board, axis=1)
col_sums = np.sum(self.board, axis=0)
x_win = win_score
o_win = win_score * -1
# check if x wins
if x_win in row_sums or x_win in col_sums or x_win in [left_diag_score, right_diag_score]:
self.winner = 1
print("game over, X wins")
return True
# check if o wins
if o_win in row_sums or o_win in col_sums or o_win in [left_diag_score, right_diag_score]:
self.winner = -1
print("game over, O wins")
return True
# check if draw
if 0 not in self.board:
self.draw = True
self.winner = 0
print("game over, draw")
return False
return False
def reset(self):
"""
back to square one
"""
self.board = np.zeros((self.n_rows, self.n_cols))
self.winner = None
self.draw = None
self.hash = None
```

#### Helper functions¶

A couple functions to help with training as well as having the agent make a move when a person is playing against it.

To train, it's basically just set up to have two agents play against eachother for `x`

rounds, and to appropriately check, update, and award each agent.

after this, it's time to play :)

```
def train_agent(player1, player2, rounds):
player1_wins = np.zeros(rounds)
player2_wins = np.zeros(rounds)
win_score = 3
for i in range(rounds):
board = Gameboard()
symbol_map = board.symbols
print(f"starting round {i+1}")
while board.winner is None:
for player in [player1, player2]:
move = player.move(board)
board.update_board(move[0], move[1], symbol_map[player.symbol])
player.update_move_history(board.get_hash())
winner = board.check_win(win_score)
# in case player one fills last spot
if board.winner is not None:
break
if winner and board.winner == player1.symbol:
player1.reward(1)
player2.reward(-1)
player1_wins[i] = 1
elif winner and board.winner == player2.symbol:
player1.reward(-1)
player2.reward(1)
player2_wins[i] = 1
elif not winner and board.winner == 0:
player1.reward(-1)
player2.reward(-1)
player1.reset()
player2.reset()
return player1_wins, player2_wins
def agent_move(player, board):
if board.winner is None:
move = player.move(board)
board.update_board(move[0], move[1], player.symbol)
player.update_move_history(board.get_hash())
board.str_rep()
win = board.check_win(3)
if win and board.winner == player.symbol:
player.reward(1)
player.reset()
else:
player.reward(-1)
player.reset()
```

```
player1 = Agent(symbol=1)
player2 = Agent(symbol=-1)
p1_wins, p2_wins = train_agent(player1, player2, 100)
```

```
np.sum(p1_wins), np.sum(p2_wins)
```

```
np.sum(p1_wins) / (np.sum(p1_wins) + np.sum(p2_wins))
```

#### Me vs Agent¶

currently pretty manual process. planning to automate this and set something up for player to actually play against the agent and more than just a string represented gameboard.

```
me = Player(symbol=-1)
game = Gameboard()
```

```
agent_move(player1, game)
```

```
me.make_move(1, 0, game)
```

```
agent_move(player1, game)
```

```
me.make_move(1, 1, game)
```

```
agent_move(player1, game)
```

```
```