What is Positive Reinforcement?

Nov 27, 2023
8 min read

Updated: Dec 6, 2023

There have been a few times where people have asked:

"How do I do this positive reinforcement thing?"

"What is positive reinforcement?"

"I've heard of positive reinforcement, but I don't know how to do it."

Oftentimes people with an education in psychology or science in general, really struggle with taking that scientific knowledge and boiling it down to something that the general public can understand.

Positive Reinforcement is part of the operant learning theory that was developed by B.F. Skinner in the early 1900's. It basically says animals (and people) operate on the environment to gain reinforcement and avoid punishment. If we just focus on positive reinforcement, it is simply, the animal does some behaviour in the environment and gains (adds +) a pleasant sensation that says to the brain, repeat this behaviour in order to gain that pleasant sensation.

A very common misconception is thinking that positive means good! Adding something pleasant is good, right? Well, it really isn't anything to do with good or bad.

Reinforcement means that it will increase the likelihood of the behaviour occurring in the future. I like to use children as an example, then move on to dogs and cats, and other animals.

A simple positive reinforcement task could be: child opens the refrigerator to access cookies. Cookies are consider pleasant in the child's eyes, therefore, the behaviour of opening the refrigerator increases because it gives the child access to cookies. Child cleans up their room and their parent gives them cookies. The cookies reinforce the behaviour of cleaning the room only if the behaviour of cleaning the room increases in the future.

Now let's say you grew up in the 80's and 90's when spanking (aka corporal punishment) was used as the predominant means for disciplining children. Let's say you misbehaved, argued with your sibling, and then your mom said "just wait until your dad gets home". This verbal threat may have been enough momentarily to reduce the behaviour in the moment. But because it likely did not lead to a reduction in the future likelihood of the behaviour, it isn't consider punishment (by the scientific meaning of the word).

Later, your dad comes home and spanks you. When your dad spanks you this is supposed to be positive punishment. You did something bad and then you received a spank, which (hopefully) decreases the likelihood of arguing with your sibling. The receipt of a spank is addition, or positive. Notice how positive is not good in this example. The argument is flawed, however, because punishment only works if it has contiguity (the time passed between the bad behaviour and the punishment is very short). I don't know about you, but the spanking was rarely occurring immediately after arguing with the sibling.

The reason I bring up this example is, fathers who have to be the bad guy, may feel guilty for spanking their children, and then offer the child a cookie afterwards when everything is settled. You were bad, received a spanking (positive punishment), then a cookie afterwards. Now what is the cookie? The cookie is now deemed less pleasant, or "poisoned" by the order in which the actions occurred. Therefore, it is not the receipt of a cookie that is positive reinforcement. This is a common mistake that we see with uneducated dog trainers. Applying a punishment to an animal (or child) and then following up with a cookie is also not "balanced".

Reinforcers are determined by the learner. The learner could be a child, a dog, or you! The reinforcer could be food, praise or attention, touch (a hug or ear scratch), play or engagement with a parent, or access to a highly valuable item like playing with a video game. The level of reinforcement, or value of that reinforcer, is determined by the learner. Let's say for example, our child in question enjoys playing soccer with their dad. Let's say the child was good and didn't argue with its sibling and therefore, the dad was in a good mood when he arrived home and offered to play soccer with the child. If the child places a high value on this activity, then it should increase the future likelihood of being good (not arguing with their sibling) in order to have the chance to earn this reinforcer.

This now brings us to reinforcement schedules. Let's say playing soccer is the reinforcer for good behaviour. Every day, dad comes home and the child has been good, then they play soccer. Now, let's say they have built this routine for a few days, but now dad has a deadline with work and is busy. So this one time, dad says he can't. This is a reinforcement schedule that is intermittent. If it is a regular occurrence to play soccer, it will still increase the likelihood of good behaviour, but if the good behaviour has not had repeated reinforcement (aka learning history) then the child may revert back to arguing with its sibling, because there is no reinforcement available. I usually say you need at least 4 repetitions (or sessions) of a behaviour-reinforcement pair before you can move on to any intermittent schedule of reinforcement. This is with cats and dogs anyway.

There are fixed-ratio schedules which can be 1 behaviour 1 reinforcer, or 2 behaviours 1 reinforcer, but it is fixed or consistent through sessions/trials. There are variable ratios, usually given as an average (VR - variable ratio of 4 is four repetitions of the behaviour to one reinforcer - on average). Remember, you cannot switch to a variable ratio too quickly, because otherwise the learner is not following. But let's say, we spent a week on a fixed ratio 1 to 1 schedule and now, we want to move to a variable ratio. You may reinforce the first, 2nd, 6th, 7th, 9th,12th, 18th - this works up to be a VR = 2.8; on average the learner earns a reinforcer every 2.8 rounds of behaviour. I know, I probably just confused you. It basically says that on the first day, dad played soccer, and on the 2nd day dad played soccer, then 3rd, 4th and 5th days, dad was busy, but as long as there has been prior history of possible reinforcement, the child gets to play on the 6th day, because it has learned the behaviour of being good earns plays time. The motivation to be good will be higher if the reinforcer is highly valuable to the learn and if the ratio is variable/unpredictable, as well as not too high of an average. Imagine if the child had to be good for 100 days before getting to play soccer with dad. It is unlikely that the child will be good every day.

There are fixed-interval schedules - which has to do with time passed, but these tend to not be as strong, so I won't get into them.

A note to those of you learning how to implement positive reinforcement. It is easier to provide a reinforcement for each behaviour. I would highly suggest to reinforce all behaviours that are hard for the learner. If not arguing with the sibling is really hard for the child, then you want to reinforce not arguing every time.

Ok, now I have used children as an example. Let's talk about dog training. We recently added a puppy to our family (if you follow my TikTok, you would have met her). I'll get to Cola Sweetie in the future! Let's say for example, we are training our puppy to eliminate in the yard (eliminate means to pee and poo). First, we must anticipate her need to go. Take her outside to pee, she pees and they we praise (one reinforcer) and give her a cookie (two reinforcer). The cookie (treat or snack) is the primary reinforcer, and praise is the secondary reinforcer. What that means is primary - something that is innate and needed (food, access to mates, access to safe environment) while a secondary reinforcer is not something that is innate, but something that is learned through pairing with a primary reinforcer. Since praise is learned and food is innate, when you are starting with a new behaviour, ensure that you have treats/food handy right away! I have a treat pouch clipped to my pants every time I am outside with our puppy.

Once your puppy goes to squat, you can then say "go pee" or whatever 'cue' you would like to use for the behaviour of peeing. A cue is just a word or phrase that you can add before a behaviour, in order to start having some control over when that behaviour occurs. Dogs do not come with English language imbedded but they can learn words and phrases mean certain things through pairing (pairing in the right order I might add).

So, now we have a behaviour chain of go outside, owner cues "go pee" puppy squats and urinates, then we say good girl, here's a cookie. Eventually, you can wean the cookie and only use praise, but NOT when you are first house-training!

Later I will talk about more specifics of house-training, because it is a common issue.

Now the behaviour of going pee outside is on cue and reinforced through treats and praise. It is the contiguity of the reinforcement of praise and treats following the behaviour of peeing outside that is the important structure of this scenario. This is positive reinforcement. Our addition of treats after the behaviour of peeing outside, increases the future likelihood of peeing outside.

Now... let's bring it back to our positive punishment example. You come home from work and find that puppy has peed in the house, so you think that puppy should know better and yell at the puppy and rub its face in its pee. The puppy postures in appeasement (which is not guilt, I may add). The problem is - you are waaaaayyy too late. Remember the rule of contiguity?? The punishment must immediately follow the behaviour in order for the learner to understand what behaviour is receiving the punishment. Just as in the child who gets spanked by their father 5 hours later - no, the child didn't learn that the behaviour he wanted to decrease was arguing with their sibling. All we end up with is a child who has a father that comes home and spanks them. Same goes for this puppy. All the puppy knows is that you come home and yell and do mean things to them. It could have peed 5 hours ago. The puppy definitely doesn't know what you mean.

This is where antecedent arrangements come in. Basically, in order for your learner to understand what is needed of them, then need repetition of the wanted behaviour, and biologically appropriate timing for that to occur. Children can learn through positive punishment, if the punishment is given immediately at the time of the unwanted behaviour. BUT this is contingent on you actually being their to give the punishment. So instead of punishing the unwanted behaviour, we can set them up for success by letting them know what is an alternative behaviour they can do to both avoid punishment and earn reinforcement. In our case with a child, it may be called bribing, but I call it a practice of delayed reinforcement, if a parent says, if you play nicely with your sibling, we will play soccer together later.

For puppies physiology, the rule of thumb for bladder control is to start when they are 4 months of age. Anything before that is golden. Quickly, after a nap - bring the puppy outside and reinforce the wanted behaviour of peeing outside. After eating/drinking within 15 minutes, outside to potty. After playing! After a walk! After anything exciting! Once they are 16 weeks, then start working on duration. If they are 4 months, they should be able to hold for 4 hours, sometimes plus 1 hour if you are lucky. Later, if you are lucky, you have trained this behaviour on cue, so that at bed time, you can cue them to empty their bladders and go to sleep.

To recap, positive reinforcement is a method of increasing behaviour through providing items of value to the learner. The value of the items is determined by the learner. The timing of the reinforcer immediately following the behaviour is important.

We can certainly talk in the four quadrants, but just because they exist, doesn't mean that we should use them. We should always be thinking about, what behaviour can I ask the learner to do instead of the unwanted behaviour.