The decimal number system we normally use for representing numbers is a positional number system in which any natural number may be uniquely represented by use of the ten symbols 0, 1, 2, ..., 9. In this system these ten symbols, also referred to as decimal digits, represent the numbers zero, one, two, . . ., nine, respectively. A unique representation of any natural number m can be given in the form
dndn-1dn-2. . .d1d0
where m >= 0. The same natural number m can also be represented in the form
dn x 10n + dn-1 x 10n-1 + dn-2 x 10n-2 + . . . + d1 x 101 + d0 x 100
For example, the number one hundred twenty-three may be represented by
123
or by
1 x 102 + 2 x 101 + 3 x 100
The decimal number system is also called the base ten system since ten digits are utilized in the number representations in this system. There is, however, nothing sacred about the base ten since the notion of a positional number system can easily be generalized to any given base b where b is a natural number greater than or equal to two.
For example, we can also represent the number one hundred twenty-three in the base two or binary number system. The symbols 0 and 1 are chosen to represent zero and one, just as ten symbols were selected in the base ten system to represent zero, one, two, . . . nine. Then, since
123 = 1 x 26 + 1 x 25 + 1 x 24 + 1 x 23 + 0 x 22 + 1 x 21 + 1 x 20
the number one hundred twenty-three would be represented in the binary number system by
1111011
So, any natural number that we can represent in base ten can also be represented in base two. It's also much easier to represent two distinct states in a physical system like a computer than it is to represent ten distinct states. You can use the two stable states of a flip-flop, two positions of an electrical switch, two distinct voltage or current levels allowed by a circuit, two distinct levels of light intensity, two directions of magnetization or polarization, etc.
The fact that it's easy to represent binary numbers with hardware has made base two the number system of choice in digital devices such as computers.
Computer memory (often called RAM - "random access memory") is measured using different units (like inches, feet, yards). Most commonly, we use
We can represent many different types of data using just groups of bits:
The variables you define in a C++ program and the code for the functions you write all occupy space in the computer's memory when the program is run. That means they occupy some number of bytes. For a variable, the number of bytes occupied depends on the variable's data type and the system the program is compiled and run on. Here are the sizes of some different data types on our Unix system:
char
= 1 byte (0 to 255 decimal)bool
= 1 byteshort int
= 2 bytes (-32,768 to 32,767 decimal)unsigned short int
= 2 bytes (0 to 65,535 decimal)int
= 4 bytes (-2,147,483,648 to 2,147,483,647)unsigned int
= 4 bytes (0 to 4,294,967,295)long int
= 8 bytes (-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807)unsigned long int
= 8 bytes (0 to 18,446,744,073,709,551,615)float
= 4 bytes (plus or minus 1038, limited to ~ 6 significant digits)double
= 8 bytes (plus or minus 10308, limited to ~ 12 significant digits)long double
= 16 bytes (plus or minus 10308, limited to ~ 31 significant digits)Note that these sizes may be different on a different computer. To find the size of a data type or a variable if you don't know it, use the sizeof
operator:
sizeof(int) // An expression that evaluates to 4 on our Unix system. sizeof(x) // Evaluates to the amount of memory that x occupies.
(Note: sizeof
looks a bit like a function, but it's not, really. It's an operator built-in to the C++ language.)
The uncertainty of the size of various data types in C and C++ is a problem. It was never defined as part of the language and it's too late to change now. More modern languages such as Java standardize the size (and therefore the range) of numeric data types so that there is no uncertainty.
Bytes in the computer's memory are assigned consecutive increasing numbers starting with the number 0. Thus, storage may be pictorially represented as
bbbbbbbb byte 0 |
bbbbbbbb byte 1 |
bbbbbbbb byte 2 |
... |
where each of the b's represent a bit and the number assigned to a given byte is called the address of that byte. Addresses range from 0 to the maximum amount on the computer. Addresses are binary numbers, but are often printed in hexadecimal (base 16) to save space. You can print an address in decimal by type casting it to an integer.
The address of a variable is the address of its first byte of storage that it occupies. Similarly, the address of a function is the address of the first byte of storage that the function's code occupies. We rarely have to know the actual address of a variable or function, but we do need to understand the idea of addresses and the fact that variables take up a certain amount of space in memory.
To obtain the address of a non-array variable, we can use the & operator.
This is not the same operator as the & used when declaring a reference variable. It also has nothing to do with the && operator used in compound conditions. This is confusing, but you just have to keep in mind the context in which you're using the &.
Table 1: Five uses of the & symbol in C++
Symbol | Context | Means | Example |
---|---|---|---|
& | In the declaration of a data type (variable declaration, function return data type, function parameter) | This data type is a reference type |
|
& | As a unary operator (variable or function name to the right of the operator, no whitespace), usually in an assignment statement or function call | "Address of" operator |
|
& | As a binary operator (variable or literal on both sides of the operator), usually in an assignment statement | Bitwise AND operator |
|
&& | As a binary operator, usually in a decision or loop condition | Logical AND operator |
|
&& | In the declaration of a data type (variable declaration, function return data type, function parameter) | This data type is an "r-value reference" (an advanced data type used in C++ "move semantics") | string&& other |
We can use the & operator to obtain the address of a variable and print it (in either hexadecimal or decimal) in a program:
#include <iostream>
using std::cout;
using std::endl;
int main()
{
int num = 5;
cout << "Value of num is " << num << endl; // Prints 5, the value of num.
cout << "Address of num (hexadecimal) is " << &num << endl; // Prints the address of num in base 16.
cout << "Address of num (decimal) is " << (long int) &num << endl; // Prints the address of num in base 10.
return 0;
}
Running this program on turing
produced the following output:
Value of num is 5 Address of num (hexadecimal) is 0x7ffdf03df4cc Address of num (decimal) is 140728634045644
You can try running this code yourself. Since the actual address of num
is determined by your computer when you run the program, you may (or may not) get different numbers for the addresses.
Figure 1 shows the relationship between the variable num
's name, value, and address.
Figure 1: Relationship between a variable's name, value, and address
Since num
is an int
variable it actually occupies four contiguous bytes, with the addresses 140728634045644 - 140728634045647 (0x7ffdf03df4cc - 0x7ffdf03df4cf in hexadecimal). The only address we have any reason to care about though is the address of the first byte, which is also the address of the variable.
A pointer variable is a special type of C++ variable that can hold the address of another variable (or as we'll see later, the address of a function).
For every non-array data type in C++ (char
, int
, long int
, float
, double
, etc.), including programmer-defined types such as Date
or CreditAccount
, you can create a pointer variable that holds the address of another variable of that data type.
The general syntax to declare a pointer variable is
data-type-to-point-to* variable-name
For example:
int* p; // p is a pointer variable that can hold the address of an int variable.
Note this syntax carefully. The int*
denotes a new data type, one that can hold the address of an int
.
We read this right to left as "p
is a pointer to an int
" or "p
holds the address of an int
"
There are several equally valid ways to write this declaration in C++:
int* p; // Valid
int * p; // Valid
int *p; // Valid
int*p; // Valid, but not recommended
We can declare a pointer variable to any non-array data type. C++ considers all of these pointer types to be different data types; a pointer declared as int*
and a pointer declared as char*
are not the same data type.
float* floatPtr; // floatPtr is a pointer to a float.
char* first; // first is a pointer to a char.
Date* datePtr; // datePtr is a pointer to a Date object.
double* x, * y; // x and y are both pointers to double.
We can use the & operator to put the address of a variable into the appropriate type of pointer variable.
int num = 5;
int* p = #
Now we can say "p contains the address of num" or "p points to num". Figure 2 illustrates the relationship we've established with these two lines of code.
Figure 2: Pointer to an int
Note that since p
is itself a variable, it occupies some number of bytes in the computer's memory and has has its own address (14072863404556 in this example).
What good does this do? Some of the most important uses will come later, but for right now, we can use this together with one more new idea to create a new way to access the value in a variable.
We know we can access the value in num
by using num
itself; for example:
cout << num << endl;
Now we can use the pointer variable to get to num
's value (assuming as above that num
). This will prove very useful soon.
But first we have to know how we can access the value stored in num
by using the pointer
The dereference operator is the *. Write the * before the pointer and you have an expression that refers to the "value pointed to" by the pointer.
So given the declarations and assignments above, we can code:
cout << num << endl; // Prints 5.
cout << *p << endl; // Also prints 5.
num
.int
variable pointed to by p
(which is the value in num
).They are the same thing.
Notice another possible source of confusion:
In a declaration, you write:
int* p;
This declares p
to be a variable that can hold the address of an integer variable. The data type of p
is int*
(pointer to an integer).
In contrast, in an executable statement you might write:
x = *p;
Here, *p
refers to "the value in the variable whose address is stored in the pointer variable p
" or more briefly, "the value pointed to by p"
In all, there are three different contexts in which you might use the character * in C++.
Table 2: Three uses of the * symbol in C++
Symbol | Context | Means | Example |
---|---|---|---|
* | In the declaration of a data type (variable declaration, function return data type, function parameter) | This data type is a pointer type | int* p; |
* | As a unary operator, with a pointer variable name or pointer arithmetic expression to the right of the operator | Dereference operator | cout << *p; |
* | As a binary operator | Multiplication operator | num = num * 5; |
We can create pointers to objects in the same fashion as pointers to built-in types like int
and double
. Pointers to objects have some additional syntax associated with them when it comes to accessing members of the object:
Expression | Meaning |
---|---|
pointer-name |
Address of the object pointed to by pointer-name |
*pointer-name |
Value of the object pointed to by pointer-name |
(*pointer-name).member_name |
Syntax to access the data member member-name of the object pointed to by pointer-name |
pointer-name->member-name |
Alternative syntax to access the data member member-name of the object pointed to by pointer-name |
|
Syntax to call the member function member-function-name() of the object pointed to by pointer-name |
pointer-name->(arguments) |
Alternative syntax to call the member function member-function-name() of the object pointed to by pointer-name |