Book - 4.1) Data Structures - Introduction
Abstractions and Data Structures¶
In this section we will explore how abstractions can be represented using data structures. By “represented” we mean putting the abstraction in a form so that we can write algorithms to manipulate the abstraction using a computer. We will see three data structures (two of which we have already encountered informally). We will also develop a simple way of taking even the most complicated abstraction and representing it using these three data structures.
We have seen that abstraction is the process of identifying the information properties of an entity relevant to a given stakeholder. The entities that can be modelled by abstraction may be an artifact (a person, a turtle, a book, a movie or any other concrete thing), an event (a crime, an election, an earthquake or any other occurrence), or a process (a trial, a game, a play or any other activity that takes place over time). Abstraction simplifies the real world entities that we are studying because only the relevant (to the stakeholder) properties become part of our abstraction. All of the other properties are ignored and are not part of our model.
Despite the fact that abstraction simplifies the real world, an abstraction may still be complicated. For example, consider the abstraction of a person where the stakeholder is a medical doctor. This “patient” abstraction ignores many properties about the person: what genre of music they like, the person;s car insurance company, the newspaper to which the person subscribes, the person’s latest Facebook or blog posts, etc.). While these properties might be relevant for other stakeholders they are not relevant to the medical doctor. However, the properties that we need to maintain for a “patient” are complicated. These properties might include a family history, recent lab tests, medications being taken, personal information (height, weight, age, gender), DNA results, lists of symptoms, health insurance plans, exposure to toxic materials, physicians notes, etc.
So far we have been working with very simple abstractions. In some cases the abstraction can be represented by a single number (a temperature or a stock value) while in other cases the abstraction can be represented by a list (e.g., a list of forecasted temperatures, a list of the magnitudes of earthquakes, a list of the crime rates in a given state over time). We now want to equip ourselves with the knowledge and tools to cope with more complicated abstractions.
The term data structures is used to denote the means for organizing a representation of an abstraction. We will see that a small number of data structures - three, in fact - are enough to represented very complicated abstractions. In studying algorithms we saw that complex algorithms can be created by combining together the simple building blocks of calculation, sequence, decision, and iteration. Similarly, a small number of data structure building blocks can be combined together to form as powerful a structure as is needed. We have also seen that there are often different ways of building an algorithm to manipulate an abstraction in a given way. So also, it is possible to combine the data structure building blocks in different ways to represent the same abstraction. These choices of algorithm design and data structure design are part of the creativity of computing.
In forming data structures we will take advantage of two things:
- People have an inherent ability to deal with complicated situations by organizing them hierarchically. For example, the United States as a government is divided into three branches. The legislative branch is divided into two chambers - the House and Senate. The House consists of representatives from each congressional district. The judicial branch is composed of the Supreme Court and Federal District courts. The executive branch consists of a number cabinet departments and other agencies, In this description the top level entity (the US Government) is divided into three parts. Each of those parts can be expressed as several component parts. Each of those component parts can be further divided. In this was all of the complicated properties of the entity have been represented - but not all at the same time or at the same level.
- The data itself often gives us a clue on how to organize it. For example, if we have a collection of properties that are all of similar kind (e.g., all temperature, or all earthquake magnitudes, or all book titles) then we can organize them into a single list. Other forms of data suggest other organizations.
The next step is to develop our toolkit of data structures.
Overview
The study of how to represent abstractions using different forms of data structures will proceed from simple examples to more complex examples. A road map to the progression of learning is given in the following table.
Summary of Data Structures and Abstraction
We have already seen how lists can be used to represent multiple instances of an abstraction that has one property. For example, the prices of a collection of books can be represented by a list. Each element of the list is the price of one of the books in the collection.
A dictionary data structure will be used to represent the many properties of a single instance of the abstraction. For example, a dictionary will contain the price, author, genre, etc.of a single book in the collection.
By combining lists and dictionaries it will be possible to represent many instances each of which has many properties. For example, using this combination of data structures the collection of books can be represented where each book is described by all of its many properties (price, author, genre, etc.).
Finally, complex abstractions often contain such a large number of properties that it is unreasonable to represent the abstraction by a single table. In this case, multiple tables are organized in a hierarchy to represent the abstraction. This organization is called layers of abstraction. Each table describes some logically related aspect of the abstraction. The hierarchical organization describes how the tables are related to each other. Using layers of abstraction is more intellectually manageable because it is never necessary to think about all of the details of the abstraction at once. Instead, focus can be given to one or a small number of tables to deal with some aspect of the abstraction.