Cracking Blackjack - Part 2

Cracking Blackjack - Part 2

To continue our work of "clicking" Blackjack using Reinforcement Learning, we must understand the picture above. Below, I broke the image into parts, and their ratings on Blackjack: 온라인카지노

  • Agent: AI summary of the learning process. In Blackjack, an agent agent.
  • Environment: A set of all the situations that an agent can interact with, the actions available in each case, and the consequences (rewards + and penalties) associated with each situation. In Blackjack, this is a set of all possible player hands, dealer cards, player actions (hit or stop), and results (win / lose / tie). 바카라사이트
  • Status: "Conditions" that help to create an environment. In our version of Blackjack, the status will include the player's hand value and the maximum dealer card value. The player / agent can only see these two things when making a decision.
  • Actions: Options available to the agent in contact with a particular region. In our Blackjack version, the available actions are hit and stopped.
  • Reward: A response received by an agent for his or her action in a particular situation. This is explained by the editor to help show the agent what the preferred results are, and the agent can begin to adjust his actions accordingly. In our version of Blackjack, win = + $ 100, lose = - $ 100, and tie = + $ 0. The idea of ​​deciding to win / lose / tie is in place; prizes are the numerical value assigned to each result of that.

How These Parts Work Together In Blackjack

The Blackjack cycle begins: 2 cards are issued to the player and the dealer, and the agent sees his or her cards and one of the dealer's cards. The location models this by sending the agent the original status (player hand value + maximum merchant card value).

The internal environment processes the action of a given agent. Works with any new "hit" cards. It counts who wins the round if appropriate (player stands, Blackjack !, or bust).

Once the round is over, nature sends the agent a new status representing the next Blackjack round, as well as the rewards of the previous round. The agent is using this result to review its policy.

If there are additional actions to be performed by the agent in the current round, the surrounding area sends the agent a country with updated values ​​in the player's hand and a + $ 0 prize as the round is over.

How Our Agent Learns In This Cycle

The above cycle indicates that this loop will continue indefinitely, so where / when did the actual learning take place?

One cycle can be represented as a sequence:

Say → Action → Reward

As we make loops in this cycle, we can record these copies of District / Action / Reward. Exceeding the "n" number of obstacles in the cycle and recording copies of the Province / Action / Reward as it progresses is called an episode.

After making our required number of loops (say 50), our agent will go through State / Action / Reward tuples and update his policy accordingly. In the next article, we will look at how our Strengthening Learning algorithm will guide our agent in using these copies of District / Action / Rewards to improve their policy.

Episodic vs Ongoing Activities

How many loops around the circle should cover the episode? In my Blackjack area, I viewed one Blackjack round as one episode. This means that there will usually be 1–3 copies of the District / Action / Prize per episode, as the agent will likely make only 1–3 decisions per Blackjack round (or more at regular intervals).

Luckily, we had a “round” idea on Blackjack to help explain the episode. However, some situations, such as using Reinforcement Learning to predict the stock market, do not have "circles" to help explain the episodes. There are no starting and stopping points in the stock market, and you will have to create when explaining the episodes!

For these reasons, predicting the stock market using Reinforcement Learning will be considered a continuous activity, and the Blackjack split will be considered an episode activity. 카지노사이트


이 블로그의 인기 게시물

This is The way to Find a Great Blackjack Game

The Odds of Losing 10 Straight Hands May Be Greater Than You Think

Online Blackjack Games - Know More About the Game