Book - 4.2) Data Structures - Types, Lists, Dictionaries
Data Structures¶
We have already encountered two of the three forms of data structures: basic quantities (like numbers and character strings) and lists. In this section some terminology is introduced that you will encounter in programming for these already familiar forms. A new tool for structuring data, a dictionary, is also introduced.
The three data structures that we will see are sufficient for dealing with a wide variety of even very complicated abstractions. There are other data structures that we will not study here. These other data structures are often needed to perform math-intensive computations, achieve greater efficiency, or require less computer memory. While these are important issues for more advanced study of computing they are, for us, not critical objectives.
Types¶
The value of single property of an abstraction can be represented in a computer by numbers or characters. The numbers may be positive or negative and may be either whole numbers (like 10, 0, -250, 2750) or numbers with a decimal point (like 2.5, -10.333, or 1245.678). Numbers typically represent quantities such as a temperature, number of crimes, the crime rate, or the magnitude of an earthquake. Individual characters are often used to form character strings that represent properties such as the title of a book, a person’s first name, the name of a state, or the city nearest to an earthquake.
The kind of information used to represent the value of a property is referred to as a “type”. We have seen already the basic types that we will work with. These types and their technical names are shown in the following table. The type name int is a contraction of the word integer (or whole number). The type name float is a shortened form of floating point number and is a number with a decimal point. The term floating point refers to the fact that the decimal point in different numbers can be in different positions (e.g., 1.23 vs. 12.3). Consecutive characters meant to be taken as a unit is termed a string (or character string). A character string is put in quote marks to be clear about which symbols are part of the character string (e.g., the title of a Shakespearean play would be written as “Macbeth” or ‘Macbeth’). While most characters and strings will be alphabetic characters any printable symbol can be a character or included in a character string. While some programming languages use only double quote marks (“”) or single quote marks (‘’) to delimit characters or strings. We will use either as long as each character or string begins and ends with the same kind of quote mark. The final type has the logical values of True and False. This type is a Boolean type named in honor of the 19th century mathematician George Boole who is noted for his work on symbolic logic and Boolean algebra.
Basic Types
These types (int, float, string, Boolean) are fundamental building blocks because these are the only kinds of data that computers are constructed to directly manipulate. All the data that we have must ultimately be expressed in terms of these types. In addition, computers are able to perform only certain operations on certain types. For example, a computer can perform mathematical calculations on numbers but it cannot perform these operations on strings. While a person could be asked to add “16” and “30” and answer “46”, a computer would not understand what this would mean. For a computer to do something similar its programming would have to say explicitly to convert the string “16” to the number 16, convert the string “30” to the number 30, and then compute 16 + 30. This is, again, a reminder that people are able to deal with ambiguity and context while computers cannot.
Lists¶
We have already seen the idea of a list in dealing with big data streams. The important characteristics of a list are:
- content: all of the list elements are of the same type and are conceptually the same
- organization: an ordered sequence of elements
- access: one at a time using the for-each iteration or by position
- construction: appending to the end
The content aspect of a list means that all the elements of the list are of identical structure (type) and meaning. For example, all the elements in the list of forecasted temperatures are of the int type and are all temperatures in the Fahrenheit scale. All of the magnitudes in the earthquakes data stream are of the float type and are all measurements on the Richter scale. For this reason, a list is also called a homogeneous data structure.
The organization aspect of a list means that the elements are arranged in a sequential order (there is a specific first element of the list, second element of the list, etc.). A list is a particularly useful data structure when the sequence of values has some significance. For example, in a five-day weather forecast the first element of the list represents the forecast for tomorrow, the second element of the list represents the forecast for the day after tomorrow, and so on. In this case, the sequence of elements in the list corresponds to successive days in the future. In the crime data we may want to represent the property crime rate in a given state over a period of years. A list is a natural way to organize the data because each successive list element represents the data for the next year in the period. In these two examples the ordering of the list corresponds to some order in time (successive days or successive years). Other common orderings are alphabetical, height, length, weight, cost, or any other factor where the data has a linear ordering.
The access aspect means that we access each successive element of the list one at a time using the for-each form of iteration. Using this form of iteration each element of the list is made available in turn as the element to be used in the current iteration. Occasionally it is necessary to access a particular element of a list. For example, the first element in the list might be needed. Because the list is ordered each element in the list can be identified by its unique position in the list. As in many programming languages, the first element of this list is at position 0, the second element of the list is at position 1, etc.. If the list has n elements the last element is at position n-1.
A list can be represent graphically as shown in the following figure. The list depicted in this figure has seven elements each of which is a simple whole number (an int type). The first value in the list is the number 45. This value is at position 0. The second number in the list, at position 1, is the number 33. Notice that values may be repeated in a list though they will be at different positions in the list. For example, the value 11 appears both at position 3 and at position 5.
We have seen examples of processing all of the elements of a list using iteration. This is a commonly needed means of accessing the elements of a list. In some cases, however, it may be useful to access a particular element of the list by its position. That is, we might know that the values in the list are in ascending order so that the smallest element in the list is in the first position.
The following figure illustrates the notion of accessing a list element by position. In this figure the value at position three is accessed. Keep in mind that the value at position three is the fourth element in the list because the positions are numbered from zero (0). In this case the value accessed is 11. Similarly, the number 33 would be accessed if we accessed the value at position 1 (the second value in the list).
Accessing a list element by position
Dictionaries¶
A dictionary is another structure for organizing the values of an abstraction. The important characteristics of a dictionary are:
- content: the elements need not be of the same type or are conceptually different
- organization: an unordered collection of key-value pairs
- access: return the value associated with a given key
- construction: adding a new key-value pair
The content aspect means that the elements can be different from each other in their structure (type) or in their meaning. For example, the author and the title of the book are both represented by a value of the string type (a sequence of characters) but these strings mean very different things. On the other hand, the price of the book can be represent as a value of float type (a number with a decimal point) while the number of pages in the book can be represented by a value of the int type (a whole number). In this case, both the structure (type) and the meaning of the values of the book’s properties are different.
The organization aspect means that a dictionary is structured so that a value is associated with a key. This data structure is called a dictionary because it takes after a language dictionary where each “word” (the key) has a “definition” (the value). The “word” and its “definition” are tied together. In our case, the key will usually be a character string (though it can be other things). The value part can be any data structure.
The organization aspect also says that dictionaries are unordered. That is, it does not matter in which order the key/value pairs are given. The idea of dictionaries being unordered seems strange when we think about a language dictionary where the words (the keys) appear in sorted order. But this ordering is only for convenience of the human user to help the search for a word. For example, the definition of “zebra” would be found closer to the end of the dictionary while the definition of the word “middle” would be nearer the middle of the dictionary. However, a language dictionary could be organized in other ways. For example, a crossword puzzle player might like the dictionary organized by length of word (all the one letter words coming first, all the two letter words coming next, etc.). The important thing is that the definition of any word is the same in the two cases.
A dictionary can be represented graphically by a table as shown in the following figure. This figure represents a dictionary holding five properties of a book: the title, the author, the price, the number of pages, and whether the book is available in paperback. At the top of each column is a key and below it is the value corresponding to that key. Each key of course must be unique in the dictionary and each key has exactly one value.
A Graphical Representation of a Dictionary
In this example of a book, the int value 320 is associated with the key pages to express the fact that a given book has 320 pages. The key price and its value 7.48 expresses the cost of the book. In this case the value is a float, that is, a number with a decimal point. The ‘author key is associated with a value of a string type, a sequence of characters symbols, to denote the book’s author.
As noted earlier, a dictionary is unordered because the key and value pairs can be written in any order without changing the meaning of what is represented. For example, the book described above could be pictured as follows:
Another Graphical Representation of a Dictionary
The unordered nature of a dictionary is also reflected in the fact that these two sentences are the same
- “The book Harry Potter #1 was written by J.K. Rowlings.”
- “J.K. Rowlings wrote the book Harry Potter #1.”
The meaning of these sentences is the same because the order in which we state the properties of the book does not change the description of the book.
The dictionary’s access property means that a value is retrieved from the dictionary by specifying the key associated with that value. To find the cost of the book in the above example the key price would be used to access the value 7.48. In the graphical form this means finding the column with the key price at the top and then using the value below this key in the same column. This is shown in the following figure.
Access Operation in a Dictionary
The construction property of a dictionary means that we add a new element to the dictionary simply by specifying a new key and value pair for the dictionary. More specifically, the key for the new element must be different from any other key currently in the dictionary. In the graphical representation this would mean adding a new column. This is shown in the figure below. In this example, the key weight with the value 2.7 is added to the book dictionary.
Update Operation in a Dictionary
This figure shows that the new key and value pair were added visually at the right end of the figure. The new key and value pair could have been inserted as a new column anywhere in the figure because the ordering of the columns does not matter to the meaning of the dictionary.
Summary of Lists and Dictionaries¶
The two fundamental structures for organizing some number of values are lists and dictionaries. The comparison between these two structures is shown in the following figure. Designing a good organization of values involves selecting the appropriate structure based on the nature of the values. In realistic situations both lists and dictionaries will be used together. Examples of using lists and dictionaries together will be seen below.
Comparing Lists and Dictionaries