Book - 2.6) Modelling - Iteration
Iteration¶
Repeating the same action over and over is a common occurrence in nature and in machines that people have built. For each of us, the repetitive beating of our hearts and the repetitive breathing of our lungs is critical to life. The repetitive movement of pistons in steam and internal combustion engines gives the power to these sources of mechanical energy. The ability to perform the same set of steps over and over is important in most uses of computers. In programming languages, the term “iteration” is often used to describe coding that produces repeated execution of some steps in the program.
Caution: Iteration Ahead
Iteration is not mindless repetition of exactly the same thing. For example, calculating “2 + 2” over and over is not really meaningful. What makes iteration powerful is that the same steps are executed over and over, but on each repetition, the steps are performed on different data.
For our purposes, iteration will be defined as follows:
Iteration: performing a set of actions to the instances of an abstraction one instance at a time.
It should be noted that there are a variety of forms of iteration in computer languages and programming. The form of iteration defined here is one of these forms but it is all that is needed to construct complex algorithms that answer interesting questions about the real world.
The definition we are using connects iteration to the idea of abstraction, illustrating that algorithms are used to compute about the real world. The instances of an abstraction capture real world entities in information terms. Using iteration, an algorithm computes about these instances by using the values of their information properties.
This definition also connects iteration to the idea of "big data". A "big data" set is simply an abstraction that has a very large number of instances. Fortunately, if an iteration correctly computes a desired result over a small number of instances it will correctly compute the equivalent result for any number of instances. This ability of iteration to handle any number of instances with the same set of actions gives computing its "power". Many machines generate physical power by performing a mechanical action repetitively. As noted above, the repetitive motion of the pistons in an internal combustion engine generates the physical power needed to move a vehicle. By analogy, the repetitive processing of instances by an algorithm using iteration generates the information processing power needed to answer questions about a large collection of data.
The phrase "one instance at a time" reflects the repetitive nature of iteration. The algorithms must be designed to deal with the limitation that the algorithm only has direct access to the "current" instance. Of course, the algorithm does not know anything about instances that it has not yet seen. If the algorithms needs to know something about an instance that it saw earlier then the algorithm must "remember" that fact in its state. For example, suppose we want to know the maximum price of any movie in our Movie abstraction. One some iteration step the algorithm has direct access to the price of the current instance. To know whether this price is larger than any price previous seen it must have remembered that previous highest price in the algorithm's state. Being able to define the algorithm's state to accommodate this limitation of iteration is an important and not easily learned skill.
We can see this pattern of iteration -- repetition of the same steps to the instances of an abstraction -- in the kinds of problems we work with in this course. Here are some examples:
- In a single simulation step of a NetLogo model, each agent must perform its rules and update its properties. Each agent is an instance of an abstraction of some real-world entity that is part of the simulation.
- In working with “big data” streams (which will be seen later), the same set of steps is usually applied to each piece of data. For example, one of the data streams has data on each earthquake that has occurred somewhere in the world. An earthquake is described by its magnitude and the location of where it happened. To find the biggest earthquake that has occurred in a particular location, we can use this test: “is the earthquake at the desired location and is the earthquake larger than any earthquake at that location we have seen so far”. Applying this test to each earthquake in the data stream is done by iteration.
- Another “big data” stream is one that records the current temperatures for U.S. cities. In this data stream, each element records the name of the city (e.g., “Blacksburg”), the code for the state (e.g., VA), and the current temperature (e.g., 78 degrees on the Fahrenheit scale). To find an average temperature, an iteration would be used to add together all of the temperatures and divide this by the number of cities.
Each of these examples illustrates the key idea of iteration: performing the same steps but with the different data in each instance.
A common form for expressing iteration in a programming language uses the idea of “for each”. In this form, the instances of an abstraction are identified: the set of agents, the list of all earthquakes, or the list of all temperatures, for example. The “for each” iteration means that a given set of statements is then to be performed on each instance, one instance at a time, until all of the instances in the abstraction have been processed exactly once.
As noted above, a critical aspect of iteration in general and the for each form in particular is that a correctly expressed iteration will work for a data set of any size. If an iteration is designed and tested on a small set of data, the same iteration without change should work equally well with a very large set of data. This characteristic of iteration gives a program the power that is needed to handle “big data” streams.
For Each Flowcharts¶
The flowchart form of a “for each” iteration can be drawn as shown in the next figure. The significant thing to notice in comparison with the decision flowcharts seen earlier is that the “for each” flowchart has a backward pointing arrow forming a “loop”. This “loop” is a reflection of the repeating nature of the for each—the statements are repeatedly applied each time to a different item in the data set.
Flowchart of For Each Code
A simple example of using for each iteration is finding the average temperature of a data set of temperatures. The average temperature is calculated by dividing the total of all of the temperatures by the number of temperatures. The flowchart below shows how to calculate the total of all of the temperatures.
Using For Each to find the total temperatures
Notice that the total-temps must be initialized prior to the iteration. The initialization is necessary to start the summation of the temperatures at zero.
We can similarly use the for each iteration to find the number of temperatures by simply counting how many temperatures there are—that is, incrementing a counter each time the statement in the iteration is executed. This flowchart is shown next.
Using For Each to find the number of temperatures
We can now combine these two flowcharts along with a final calculation to determine the average temperature. This is shown in the following flowchart.
Calculating the Average Temperature
While the above flowchart solution is perfectly correct, it does have one drawback: high cost. To find the total temperatures, the first for each iteration uses each element in the set of temperature data. Similarly, to find the total number of temperatures, the second for each iteration also uses each element in the set of temperature data. If we are dealing with “big data” applications (imagine temperature data for hundreds of thousands of cities), each iteration of the entire data set is a significant computational cost.
We can cut this cost in half by combining the two for each iterations into one. This combination is shown in the following flowchart.
Calculating the Average Temperature With Less Cost
Notice that the two initializations have been combined and both performed prior to the start of the iteration. Also, the statements in the iterations have been combined: one increments the total temperature and one increments the count of the number of temperatures. Finally, the average temperature is calculated when the for each iteration terminates.
For Each in Code¶
The for each iteration can also be written in a textual language form. We will see later how this iteration is written in Python. For now, a syntax that can be used for examples is shown next.
foreach t in turtles [ turtle-actions ]
foreach q in earthquakes [ quake-actions ]
foreach temp in temperatures [ temperature-actions ]
The meaning of the turtle example is that each turtle is selected one at a time and the steps in turtle-actions are carried out for that turtle. Each turtle is selected exactly once and no turtle is ignored. In more detail, on each iteration, the name “t” refers to the turtle being considered on the current iteration and the steps in turtle-actions are applied to whatever turtle is identified by “t”.
Likewise, the meaning of the earthquake example is that each earthquake is selected one a time and the steps in quake-actions are carried out for the current earthquake before the next earthquake is selected. Each earthquakes is selected exactly once and no earthquake is ignored. In more detail, on each iteration, the name “q” refers to the earthquake being considered on the current iteration and the steps in quake-actions are applied to whatever earthquake is identified by “q”. Then the next earthquake is selected and identified as “q”, and the steps in quake-actions are applied to that earthquake.
Finally, the meaning of the temperature example is that each temperature is selected one at a time and the steps in the temperature-actions are carried out for that temperature until all of the temperatures have been processed. On each step, “temp” refers to the currently selected temperature. The statements in temperature-actions are performed on the temperature refered to by “temp”.
For Each in NetLogo¶
Different programming languages have different syntaxes for describing the “for each” form of iteration. Because NetLogo is designed to focus on set of agents (i.e., turtles and patches), the NetLogo programming language has a compact way of expressing the “for each” iteration over sets of agents. In NetLogo, this is written as:
ask turtles [ turtle-actions ]
ask patches [ patches-actions ]
The sense of this terminology is that the program is “asking” the turtles (one at a time) to perform the specified set of actions. In other words, the meaning of “ask turtles” is the same as “foreach t in turtles”. This applies in a similar way for the meaning of “ask patches”. A specific example of iteration in our simple ecological model is this code:
ask turtles [ ; apply this rule to all turtles one at a time
right random 360
forward 1
set energy energy - 1
set age age + 1
]
In this code, each turtle, one at a time, is asked to perform the four actions that move the turtle to a new location. The first two actions pick a random direction and move the turtle one step in that direction. The next two steps update the properties of the turtle by decreasing its energy and increasing its age.
Here is another example, this time iterating through the patches:
ask patches [ ; apply this rule to all patches one at a time
if plant-energy = 0 [
set regrow-time regrow-time - 1
if regrow-time = 0 [
set pcolor green
set number-green-patches number-green-patches + 1
set plant-energy energy-from-grass
]
]
]
In this code, each patch is selected one at a time. The currently selected patch checks to see if it has no grass (i.e., its plant-energy is 0). If there is no grass, then the time to regrow the grass is decreased and a test is made to see if it now time to regrow the grass (i.e., the regrow-time is 0). If it is time to regrow the grass, then the patch changes its color property to green, increases the global property number-green patches, and changes its own plant-energy property to the current setting of the energy-from-grass slider.
Other Forms of Iteration¶
There are other ways of expressing iteration. These are briefly described here. Some of these forms may be encountered later as we see different programming languages in this course, and you may see these forms if you learn other programming languages outside of this course. One alternative form of iteration is “counting” iteration. This means that the iteration is repeated a fixed number of times over some range of numbers. Counting iteration might specify, for example, that the steps are repeated over the range 1 to 10 or 0 to 99. A second alternative for iteration is “condition” iteration. This means that the iteration is repeated until a given condition is false. For example, in our simple ecological model we might want to stop the simulation if all of the turtles die. Expressing this as condition iteration yields something like “while number-turtles > 0 [ simulation-steps ]”. This means that as long as the value of the global property number-turtles is greater than 0, the simulation steps will continue to be executed. Each of these forms of iteration is useful in different contexts depending on the nature of the iteration.
Decisions and Iteration¶
We have seen how decisions and iteration can be used separately and we will now see an example of how they can be used together. The combination of these two computational structures allows us to build programs that have the selectivity provided by decisions and the power provided by iteration.
A simple example of combining decisions and iteration is the task of finding the maximum temperature in a data set of temperatures. The algorithm clearly needs iteration because we need to examine each temperature in the data set. If there was even one temperature that was not examined, it might be the case that that unexamined temperature is the actual maximum temperature. The algorithm also needs to make a decision about whether the current temperature being examined is larger than any temperature seen so far in the iteration. The flow chart representation of the algorithm that combines decision and iteration is shown next.
Combining Decision and Iteration
The logic of this algorithm is that at each iteration step, the property current-max is the largest value seen thus far. If temp, the current temperature value on this iteration, is larger than the largest value seen thus far, then current-max is set to temp. Notice that current-max is initialized to zero before the iteration begins. This initialization is necessary in order for the test made in the decision to be possible on the first iteration (assuming all of the temperatures are greater than zero to begin with).