About This Course

Who this course is for:

  • Anyone with basic Python skills desiring to start in Reinforcement Learning
  • Experienced AI Engineers, ML Engineers, Data Scientist, and Software Engineers wanting to apply Reinforcement Learning to real business problems
  • Business professionals willing to learn how Reinforcement Learning can help with automating adaptive decision making processes

What you’ll learn: 

  • Understand and be able to identify Multi-Armed Bandit (MAB) problems
  • Model real business problems as MAB and implement digital AI agents to automate them
  • Understand the challenge of Reinforcement Learning regarding the exploration-exploitation dilemma
  • Practical implementation of the various algorithmic strategies for balancing between exploration and exploitation
  • Python implementation of the Epsilon-greedy strategy
  • Python implementation of the Softmax Exploration strategy
  • Python implementation of the Optimistic Initialization strategy
  • Python implementation of the Upper Confidence Bounds (UCB) strategy
  • Understand the challenges of Reinforcement Learning in terms of the design of reward functions and sample efficiency
  • Estimation of action values through incremental sampling


  • Be able to understand basic OOP programs in Python
  • Have basic Numpy and Matplotlib knowledge
  • Basic algebra skills

Software version used in the course:

  • Python 3.9.5.

With very concise explanations, this course teaches you how to confidently translate seemingly scary mathematical formulas into Python code painlessly. We understand that not many of us are technically adept in the subject of mathematics so this course intentionally stays away from maths unless it is necessary. And even when it becomes necessary to talk about mathematics, the approach taken in this course is such that anyone with basic algebra skills can understand and most importantly easily translate the maths into code and build useful intuitions in the process.

Some of the algorithmic strategies taught in this course are Epsilon Greedy, Softmax Exploration, Optimistic Initialization, Upper Confidence Bounds, and Thompson Sampling. With these tools under your belt, you are adequately equipped to readily build and deploy AI agents that can handle critical business operations under uncertainties.

Our Promise to You

By the end of this course, you will have learned to create multi-armed bandit algorithms.

30 Day Money Back Guarantee. If you are unsatisfied for any reason, simply contact us and we’ll give you a full refund. No questions asked.

Get started today and learn more about Python programming.

Course Curriculum

Section 1 - Introduction And Course Lessons
Introduction To Reinforcement Learning And Multi-Armed Bandit Problems 00:00:00
Implementing Simulated MAB Environments In Python 00:00:00
Estimating Action Values Through Sampling 00:00:00
Implementing Incremental Average In Code 00:00:00
Implementing Incremental Average For Non-Stationary Bandits 00:00:00
Building A Baseline Agent That Behaves Randomly 00:00:00
Why Are The Results Not Repeatable? 00:00:00
Implementing And Analysing A Greedy Agent 00:00:00
Balancing Exploration And Exploitation With Epsilon Greedy Agents 00:00:00
Controlling Exploration With A Decay 00:00:00
Exploring Intelligently With Softmax Exploration 00:00:00
Being Optimistic Under Uncertainties 00:00:00
Realistic Optimism Under Uncertainties 00:00:00
Template Design © VibeThemes. All rights reserved.

Setup Menus in Admin Panel