Basic Operant Conditioning

This is a simplified version of the Matheta program on this website that controls the Roomba. It highlights the operation of the “fixer” in the process of reinforcement. Although there is nothing in this program not explained elsewhere on this website, it is provided as possible useful in conducting more detailed statistical and parametric analysis of the operation of the essential features of the learning paradigm.

Program Description

The program is designed to imitate the training of an organism in a Skinner Box. Because it is the simplest case example, it will be described in detail as many of its features are common to other programs. It runs “in silico” in that user input via the keyboard represents the presentation of a stimulus (a “light” that is on or off) and a positive and negative stimulus representing food or punishing shock. The organism has four operant behaviors (“Behavior 1”, “Behavior 2”, etc.) that are emitted by the organism at random intervals that are determined by its “energy”. All of this happens as text presented on the screen.

The program spends the majority of its time awaiting a keyboard input. When that occurs appropriate input is put to the screen and, since this input represents stimuli the organism can detect, it also produces an appropriate “stimulus” vector. A value 9 is placed in the stimulus matrix which functions to both indicate a stimulus has been detected but also to mark the beginning of the period of eligibility for synaptic modification. The value decays quickly. If a neuron this stimulus synapses upon fires before complete decay to zero, then that particular synapse will be increased in value in accordance with the learn method. It should be noted that this synaptic increase also decreases over time as shown in the memory decay section of the program.

Multiplying the brain matrix by the stimulus vector has the effect of summing the product of the value of each stimulus with the corresponding value of synaptic efficacy for each neuron. This sum is seen as representing the continuous, time varying membrane potential of a neuron. If this potential exceeds a threshold value, the cell fires and the organism emits one of its operants or emits the positive or negative fixer. If the user has turned the light on and faithfully rewards a particular operant when it occurs, then over time the organism will come to preferentially emit that operant when the light is on.

Since the fixer is an internal event there is no visual representation of it, however what happens internally is essential to the model. As can be seen in the the positive_fixer method, when the fixer is released, it increases the value of the appropriate synapse in the longmem matrix. This value sets a lower limit on the value a given temporary increase in “brain” synaptic value can decay to. Hence, the effect of repeated sequences of operant x being reinforced (fed) is to produce an increasing value in the synaptic connection between the light stimulus and the neuron that emits operant x. At some point merely turning on the light will cause the emission of operant x.

Unique Details of the Program

Unlike most programs presented here, this version includes inhibitory connections between stimuli and neurons and also a negative fixer. The negative fixer works just like the positive one with the effect that a previously neutral stimulus will come to inhibit the emission of a punished operant when the light is on.

The Neural Network

The Matrix Reprsentation

The Code

File OnlyOperants.rb at link below