Header Files and Linkage

This article describes the need for header files in a c++ program. It explains what should a header file contain. We also explain how to put function declaration and definition in two separate files, and how to use that function in a third file such as the main cpp file of a project. Linker and compiler errors are also explained, and lastly, we discuss the extern keyword and the question of internal and external linkage in C++.

Last Reviewed and Updated on February 7, 2020
Posted by Parveen(Hoven),
Aptitude Trainer and Software Developer

My C/C++ Videos on Youtube

Here is the complete playlist for video lectures and tutorials for the absolute beginners. The language has been kept simple so that anybody can easily understand them. I have avoided complex jargon in these videos.

Before you are able to understand one of the most intricate concepts of programming languages - the static keyword - I'll take you back to the point where we learnt to use our first functions. But this time you will use your functions through your own header files, and multiple translation units[i.e., .cpp files]. A cpp source file is also called a translation unit. A larger project consists of many translation units so that the code remains organized, and is easy to read and maintain. It is absolutely possible for you to write your entire code in a single cpp file. But it will be at the cost of making it very bulky. You might find it difficult to locate functions and classes in a long file. You might also experience a scrolling lag as you navigate towards the lower portions of your file. Segregation of your code into different, dedicated translation units can help avoid these basic problems. It's similar to organizing files and photos into folders on a computer.

What is a Header File

A header file usually has an extension, .h, but it can have any extension, or have no extension at all. It is used to contain declarations of functions and definitions of classes and structs. Function declarations do not need memory allocation - memory allocation is required for a function definition, not declaration. Similarly, a class or struct definition needs no memory allocation - memory allocation is required when objects are created. So, as a general rule we can say: header files contain nothing that could allocate memory. That's why it's rare to find global objects and variables inside header files.

A typical header file looks like this:

#ifndef MYFILE_H
#define MYFILE_H
 
// declaration
void fx();
 
#endif

The two lines at the top are preprocessor directives that have a symbol called MYFILE_H. You are free to use any valid identifier for this symbol; but it is customary to use the the name of the containing header file in uppercase. For example, this symbol could be named MYFILE_X inside a header file called "myfile.x". If you just try to guess the meaning of these directives, you will probably be able to make out that the declaration of the function fx will occur only if the symbol "MYFILE_H" hasn't been previously defined. And, if this symbol is not defined previously, the function declaration takes place, and simultaneously the symbol MYFILE_H is defined, preventing further declarations within the same translation unit. The whole point is to prevent multiple declarations of fx. Multiple declarations can occur if an "un-guarded" header file gets included in a .cpp file more than once through different chains of inclusions - one header file including another, and a chain that includes the same header file back again.

Multiple declarations of functions are allowed in C++ because it has always been traditionally allowed in C. So putting the #ifdef guards in a header file isn't much required on this account. But multiple definitions of a class, struct or union are not allowed in C++. This preprocessor tact is required primarily for preventing multiple definitions of classes and structs.

Placing Functions in Separate Header Files and Translation Units

Before you proceed further, I advise you to go through this video on how to add two or more files to a C++ project. I am using Dev C++ compiler, but any IDE could be used. The method remains the same.

Step 1 - Create a header file called myfile.h

This header file contains one declaration only; of the function fx. This file should be placed in the same folder as your main.cpp. This is not a requirement of C++. But it will help us keep things simpler. When you write directives like #include "myfile.h", the compiler will look for your header file in the same folder as your cpp file. It looks at a location relative to your cpp file. Otherwise you will have to give an absolute path like #include "C:\..." or you will have to go about the longer way of tinkering with the PATH variables of your operating system. Please note that nobody gives absolute paths in real practice because if you move your project to another PC, the paths might break. Developers usually use relative paths like "..\..\abc\myheader.h" which says "go two directories up, and then move into a folder called abc. For now, we shall place all our files side by side in the same folder.

#ifndef MYFILE
#define MYFILE
 
// declaration
void fx();
 
#endif

Step 2 - Create a C++ file called myfile.cpp

The translation unit contains the definition of the function fx. The declaration is included through the #include directive at the top of this file. Place this file also adjacent to your main.cpp, so that all these three files are in the same folder.

#include "myfile.h"
 
#include <iostream>
 
using namespace std;
 
// definition
void fx()
{
    cout << "Hello fx()" << endl;
}

Step 3 - Use the Function

The function fx can now be used inside main. The #include directive causes the function declaration to be made available to the compiler.

#include <iostream>
 
#include "myfile.h"
 
using namespace std;
 
int main() 
{
    fx();
 
}

Linking and Compilation

Compilation of code is: "syntax check". The compiler checks whether you have used each and every token in the correct way, and at the correct place. Memory allocation is not done by the compiler. It is a pre-checker of your C++ code. For example, if you write a statement like - z = 99; without introducing this "z", your compiler will flag an error. Following is an example of the compiler error. The identifier "ffx" has been used without introduction.

compiler error in a c++ project. This error is encountered when a name is mis-spelt. compiler error in a c++ project. This error is encountered when a name is mis-spelt.

It is very important for you to understand that the compiler cannot "see" beyond the file that it is currently compiling. It cannot search for names of identifiers declared elsewhere in your project.When we include a header file, we are introducing declarations to the compiler. If we comment out the #include directive in our cpp file above, we'll get a compiler error because the compiler won't be able to figure out what "fx" is.

After the compilation succeeds, the process of linking begins. A linker is a software, just like a compiler is. Memory allocations occur during linking. Memory has to be allocated both for variables and for functions. So, if you declare a function, but do not provide its definition, then things go wrong during the linking phase. When the linker cannot find the code body of a function, it cannot allocate any memory location for that function. A linker error is thrown at that time, as you can see in the video above.

The linker is a very smart software. It has the ability to search for definitions inside the translation units - the .cpp files. That's why we there are no include directives for the .cpp files. There is no need to "include" .cpp files. The linker can read all the translation units of a project and attempt to find definitions inside them. If it can't "link" the declarations to definitions, it throws an error.

An Example with External Linkage

Let's take a small exercise. For this create a C++ project, and run it to ensure that everything is alright to begin with. Next, add another CPP file, and inside that file create a global int type of variable giving it an initial value of 9.

int i = 9;

Next come back to your main and try to display the value of that variable.

#include <iostream>
using namespace std;
 
int main()
{
    cout << i << endl;
}

Your compiler will complain because it doesn't know what this "i" is. The compiler is right in doing so because it can't go outside this file to discover and find out about i. The linker could surely find and link to "i" later on, but that's at a later stage - the things aren't yet ready to move beyond the compilation stage. The question right now is: how to tell the compiler that "there is an i out inside another file of the same project".

You might be tempted to think about putting your i inside a header file and include that file in the main cpp file. That would be inviting a bigger trouble - from the linker this time. If that header file is included by two cpp files of your project, then the linker would find two i, leading to a conflict of names, and finally throwing a linker error. The solution doesn't lie in using header files.

C++ provides a keyword called "extern" for such situations. extern is used to declare the existence of an object elsewhere in your project. The compiler believes you, and allows the compilation to conclude. The task of finding the object is left to the linker. If it can find it, things are fine, otherwise a linker error is thrown. This is how the extern declaration is made.

#include <iostream>
using namespace std;

// extern declaration 
extern int i;

int main()
{
    cout << i << endl;
}

Internal Linkage

A global variable inside a translation unit is discoverable and available to other units of a project. If it has to be used in other translation units, then the extern keyword can be used to introduce it.

A global variable can be prevented from being used in other cpp files of your project. It can be hidden from other translation units by restricting its visibility to just the file where it is originally defined. For this you should define a variable with the static keyword. A function can also be defined with the static keyword. A static global variable or function definition is available just to the file in which it is originally defined, and it is said to have internal linkage.

Two or more cpp files can contain static global variables of the same name without leading to linker errors.



Creative Commons License
This Blog Post/Article "Header Files and Linkage" by Parveen (Hoven) is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Updated on 2020-02-07. Published on: 2015-11-24