CScientists: CPP

What is The Rule of Three?

Q:What is The Rule of Three?

What does copying an object mean?
What are the copy constructor and the copy assignment operator?
When do I need to declare them myself?
How can I prevent my objects from being copied?

Best Answer:

Can a local variable's memory be accessed outside its scope?

Question:
I have the following code:

#include <iostream>

int * foo()
{
int a = 5;
return &a;
}

int main()
{
int* p = foo();
std::cout << *p;
*p = 8;
std::cout << *p;
}

And the code is just running with no runtime exceptions!

The output was 58

How can it be? Isn't the memory of a local variable inaccessible outside its function?

Best Answer:

You rent a hotel room. You put a book in the top drawer of the bedside table and go to sleep. You check out the next morning, but "forget" to give back your key. You steal the key!

A week later, you return to the hotel, do not check in, sneak into your old room with your stolen key, and look in the drawer. Your book is still there. Astonishing!

How can that be? Aren't the contents of a hotel room drawer inaccessible if you haven't rented the room?

Well, obviously that scenario can happen in the real world no problem. There is no mysterious force that causes your book to disappear when you are no longer authorized to be in the room. Nor is there a mysterious force that prevents you from entering a room with a stolen key.

The hotel management is not required to remove your book. You didn't make a contract with them that said that if you leave stuff behind, they'll shred it for you. If you illegally re-enter your room with a stolen key to get it back, the hotel security staff is not required to catch you sneaking in. You didn't make a contract with them that said "if I try to sneak back into my room later, you are required to stop me." Rather, you signed a contract with them that said "I promise not to sneak back into my room later", a contract which you broke.

In this situation anything can happen. The book can be there -- you got lucky. Someone else's book can be there and yours could be in the hotel's furnace. Someone could be there right when you come in, tearing your book to pieces. The hotel could have removed the table and book entirely and replaced it with a wardrobe. The entire hotel could be just about to be torn down and replaced with a football stadium, and you are going to die in an explosion while you are sneaking around.

You don't know what is going to happen; when you checked out of the hotel and stole a key to illegally use later, you gave up the right to live in a predictable, safe world because you chose to break the rules of the system.

C++ is not a safe language. It will cheerfully allow you to break the rules of the system. If you try to do something illegal and foolish like going back into a room you're not authorized to be in and rummaging through a desk that might not even be there anymore, C++ is not going to stop you. Safer languages than C++ solve this problem by restricting your power -- by having much stricter control over keys, for example.

UPDATE

Holy goodness, this answer is getting a lot of attention. (I'm not sure why -- I considered it to be just a "fun" little analogy, but whatever.)

I thought it might be germane to update this a bit with a few more technical thoughts.

Compilers are in the business of generating code which manages the storage of the data manipulated by that program. There are lots of different ways of generating code to manage memory, but over time two basic techniques have become entrenched.

The first is to have some sort of "long lived" storage area where the "lifetime" of each byte in the storage -- that is, the period of time when it is validly associated with some program variable -- cannot be easily predicted ahead of time. The compiler generates calls into a "heap manager" that knows how to dynamically allocate storage when it is needed and reclaim it when it is no longer needed.

The second method is to have a “short-lived” storage area where the lifetime of each byte is well known. Here, the lifetimes follow a “nesting” pattern. The longest-lived of these short-lived variables will be allocated before any other short-lived variables, and will be freed last. Shorter-lived variables will be allocated after the longest-lived ones, and will be freed before them. The lifetime of these shorter-lived variables is “nested” within the lifetime of longer-lived ones.

Local variables follow the latter pattern; when a method is entered, its local variables come alive. When that method calls another method, the new method's local variables come alive. They'll be dead before the first method's local variables are dead. The relative order of the beginnings and endings of lifetimes of storages associated with local variables can be worked out ahead of time.

For this reason, local variables are usually generated as storage on a "stack" data structure, because a stack has the property that the first thing pushed on it is going to be the last thing popped off.

It's like the hotel decides to only rent out rooms sequentially, and you can't check out until everyone with a room number higher than you has checked out.

So let's think about the stack. In many operating systems you get one stack per thread and the stack is allocated to be a certain fixed size. When you call a method, stuff is pushed onto the stack. If you then pass a pointer to the stack back out of your method, as the original poster does here, that's just a pointer to the middle of some entirely valid million-byte memory block. In our analogy, you check out of the hotel; when you do, you just checked out of the highest-numbered occupied room. If no one else checks in after you, and you go back to your room illegally, all your stuff is guaranteed to still be there in this particular hotel.

We use stacks for temporary stores because they are really cheap and easy. An implementation of C++ is not required to use a stack for storage of locals; it could use the heap. It doesn't, because that would make the program slower.

An implementation of C++ is not required to leave the garbage you left on the stack untouched so that you can come back for it later illegally; it is perfectly legal for the compiler to generate code that turns back to zero everything in the "room" that you just vacated. It doesn't because again, that would be expensive.

An implementation of C++ is not required to ensure that when the stack logically shrinks, the addresses that used to be valid are still mapped into memory. The implementation is allowed to tell the operating system "we're done using this page of stack now. Until I say otherwise, issue an exception that destroys the process if anyone touches the previously-valid stack page". Again, implementations do not actually do that because it is slow and unnecessary.

Instead, implementations let you make mistakes and get away with it. Most of the time. Until one day something truly awful goes wrong and the process explodes.

This is problematic. There are a lot of rules and it is very easy to break them accidentally. I certainly have many times. And worse, the problem often only surfaces when memory is detected to be corrupt billions of nanoseconds after the corruption happened, when it is very hard to figure out who messed it up.

More memory-safe languages solve this problem by restricting your power. In "normal" C# there simply is no way to take the address of a local and return it or store it for later. You can take the address of a local, but the language is cleverly designed so that it is impossible to use it after the lifetime of the local ends. In order to take the address of a local and pass it back, you have to put the compiler in a special "unsafe" mode, and put the word "unsafe" in your program, to call attention to the fact that you are probably doing something dangerous that could be breaking the rules.

Why is “using namespace std;” considered bad practice?

Question:
I've been told by others that writing using namespace std; in code is wrong, and that I should use std::cout and std::cin directly instead.

Why is using namespace std; considered a bad practice? Is it inefficient or does it risk declaring ambiguous variables (variables that share the same name as a function in std namespace)? Does it impact performance?

Best Answer:

This is not related to performance at all. But consider this: you are using two libraries called Foo and Bar:

using namespace foo;
using namespace bar;

Everything works fine, you can call Blah() from Foo and Quux() from Bar without problems. But one day you upgrade to a new version of Foo 2.0, which now offers a function called Quux(). Now you've got a conflict: Both Foo 2.0 and Bar import Quux() into your global namespace. This is going to take some effort to fix, especially if the function parameters happen to match.

If you had used foo::Blah() and bar::Quux(), then the introduction of foo::Quux() would have been a non-event.

Why is iostream::eof inside a loop condition (i.e. `while (!stream.eof())`) considered wrong?

Question: I just found a comment in this answer saying that using iostream::eof in a loop condition is "almost certainly wrong". I generally use something like while(cin>>n) - which I guess implicitly checks for EOF.

Why is checking for eof explicitly using while (!cin.eof()) wrong?

How is it different from using scanf("...",...)!=EOF in C (which I often use with no problems)?

Best Answer:

Because iostream::eof will only return true after reading the end of the stream. It does notindicate, that the next read will be the end of the stream.

Consider this (and assume then next read will be at the end of the stream):

while(!inStream.eof()){
int data;
// yay, not end of stream yet, now read ...
inStream >> data;
// oh crap, now we read the end and *only* now the eof bit will be set (as well as the fail bit)
// do stuff with (now uninitialized) data
}

Against this:

int data;
while(inStream >> data){
// when we land here, we can be sure that the read was successful.
// if it wasn't, the returned stream from operator>> would be converted to false
// and the loop wouldn't even be entered
// do stuff with correctly initialized data (hopefully)
}

And on your second question: Because

if(scanf("...",...)!=EOF)

is the same as

if(!(inStream >> data).eof())

and not the same as

if(!inStream.eof())
inFile >> data

Why can templates only be implemented in the header file?

The only portable way of using templates at the moment is to implement them in header files by using inline functions.

Q: Why is this?

(Clarification: header files are not the only portable solution. But they are the most convenient portable solution.)

Best Answer:

It is not necessary to put the implementation in the header file, see the alternative solution at the end of this answer.

Anyway, the reason your code is failing is that, when instantiating a template, the compiler creates a new class with the given template argument. For example:

template<typename T>
struct Foo
{
T bar;
void doSomething(T param) {/* do stuff using T */}
};

// somewhere in a .cpp
Foo<int> f;

When reading this line, the compiler will create a new class (let's call it FooInt), which is equivalent to the following:

struct FooInt
{
int bar;
void doSomething(int param) {/* do stuff using int */}
}

Consequently, the compiler needs to have access to the implementation of the methods, to instantiate them with the template argument (in this case int). If these implementations were not in the header, they wouldn't be accessible, and therefore the compiler wouldn't be able to instantiate the template.

A common solution to this is to write the template declaration in a header file, then implement the class in an implementation file (for example .tpp), and include this implementation file at the end of the header.

// Foo.h
template <typename T>
struct Foo
{
void doSomething(T param);
};

#include "Foo.tpp"

// Foo.tpp
template <typename T>
void Foo<T>::doSomething(T param)
{
//implementation
}

This way, implementation is still separated from declaration, but is accessible to the compiler.

Another solution is to keep the implementation separated, and explicitly instantiate all the template instances you'll need:

// Foo.h

// no implementation
template <typename T> struct Foo { ... };

//----------------------------------------
// Foo.cpp

// implementation of Foo's methods

// explicit instantiations
template class Foo<int>;
template class Foo<float>;
// You will only be able to use Foo with int or float

What is an undefined reference/unresolved..?

Question: What are undefined reference/unresolved external symbol errors? What are common causes and how to fix/prevent them?

Best Answer:

The precedence among the syntax rules of translation is specified by the following phases.

Physical source file characters are mapped, in an implementation-defined manner, to the basic source character set (introducing new-line characters for end-of-line indicators) if necessary. [SNIP]
Each instance of a backslash character (\) immediately followed by a new-line character is deleted, splicing physical source lines to form logical source lines. [SNIP]
The source file is decomposed into preprocessing tokens (2.5) and sequences of white-space characters (including comments). [SNIP]
Preprocessing directives are executed, macro invocations are expanded, and _Pragma unary operator expressions are executed. [SNIP]
Each source character set member in a character literal or a string literal, as well as each escape sequence and universal-character-name in a character literal or a non-raw string literal, is converted to the corresponding member of the execution character set; [SNIP]
Adjacent string literal tokens are concatenated.
White-space characters separating tokens are no longer significant. Each preprocessing token is converted into a token. (2.7). The resulting tokens are syntactically and semantically analyzed and translated as a translation unit. [SNIP]
Translated translation units and instantiation units are combined as follows: [SNIP]
All external entity references are resolved. Library components are linked to satisfy external references to entities not defined in the current translation. All such translator output is collected into a program image which contains information needed for execution in its execution environment. (emphasis mine)

Implementations must behave as if these separate phases occur, although in practice different phases might be folded together.

The specified errors occur during this last stage of compilation, most commonly referred to as linking. It basically means that you compiled a bunch of implementation files into object files or libraries and now you want to get them to work together.

Say you defined symbol a in a.cpp. Now, b.cpp declared that symbol and used it. Before linking, it simply assumes that that symbol was defined somewhere, but it doesn't yet care where. The linking phase is responsible for finding the symbol and correctly linking it to b.cpp (well, actually to the object or library that uses it).

If you're using Microsoft Visual Studio, you'll see that projects generate .lib files. These contain a table of exported symbols, and a table of imported symbols. The imported symbols are resolved against the libraries you link against, and the exported symbols are provided for the libraries that use that .lib (if any).

What is The Rule of Three?

Can a local variable's memory be accessed outside its scope?

UPDATE

Why is “using namespace std;” considered bad practice?

Why is iostream::eof inside a loop condition (i.e. `while (!stream.eof())`) considered wrong?

Why can templates only be implemented in the header file?

What is an undefined reference/unresolved..?

Categories

Popular Posts

Copyright