Saturday, January 8, 2011

Analysis



I'll talk about the function MakeStringAlternate() first, as well as the variables local to main() called MakeString_Returned and mainW.

I have a pointer to a character variable called mainW, and I've assigned the string literal "Variable in Main\0" to this. I might discuss why I didn't go with the assignment char mainW[100] or similar later.

I have a function that will return a pointer to a character called MakeStringAlternate that takes a pointer to a char pointer as an argument with a local variable called localW. That is to say, it accepts as an argument the memory address of a char pointer. Inside this function I assign a different string to a dereferenced localW. Then I return a dereferenced localW.

So, what I have a done really then?

My goal was to alter a character string in one function (main) from another function (MakeStringAlternate). I can compile the program and it works, but why?

In main() I have the variable mainW that is a pointer to a character, and I have assigned a string to this. Specifically I think I assigned a string literal, if my reading is correct.

I want to alter mainW in the function MakeStringAlternate. To alter a variable local to one function from another, I need to give the other function the memory address of the variable. Since my variable is already a pointer I will be passing a pointer to a pointer - this is why the argument taken by MakeStringAlternate is (char **localW).

Ok, so I've passed the memory address of my variable local to main to the external function MakeStringAlternate. Inside MakeStringAlternate I make the following assignment:

*localW = "I have altered the variable in main\0";

This states that I am dereferencing localW - meaning I don't want to alter the memory address stored by localW, I want to alter what's AT the memory address contained in localW. At this memory address lives mainW in main() and I assign a new string literal.

What I'm most confused about is my return statement. If I'm returning a dereferenced pointer back to the variable I just altered, am I not saying:

address of mainW = address of mainW?

Is that return statement necessary? What if the function was type void and I just did the operation? Further investigation is needed. I'm not sure why I even made it like that now...

So, the BIG QUESTIONS.

1) I thought that string literals had to be constant and couldn't be modified. This doesn't appear to be the case, but why?

2) In the program I make a variable that just points to some memory with some text in it of a specific length. Then I say "you know what, replace that text with something longer!" and everything goes ok?! How did the compiler know not to put anything important in the stack right after my original declaration of the variable? Did I do a general no-no and get away with it?

3) I originally had the variables declared as "char mainW[100]" but abandoned it when the terminology got confusing. At least THIS would have made sense since I was allocating specific amounts of space! Would the compiler warn me if I was going over then?

Hopefully tomorrow I'll post about the other function MakeString, and why I don't understand why it works at all! Matt threw me some knowledge, but I don't understand the bits enough to really internalize things. Hopefully I can keep making these self-explaining programs as a reference.

4 comments:

  1. When you assign a string literal using '=' the compiler does a few things:

    a) statically allocates room for the string literal somewhere in the output binary.

    b) copies the characters of your string literal into that space.

    c) Initializes your pointer ('mainW') with the address of the first character of the string literal.

    The critical piece of information here is that when we manipulate 'mainW' we are not manipulating the string literal. We are manipulating 'mainW's data - which is an ADDRESS to the string literal in your example.

    It's probably useful to step back and review the fact that C has no inherent way of dealing with strings. We instead have to manipulate them as we would any other kind of array - in terms of pointers. When you "assign" a string to a variable (which must have pointer type), the only data that gets assigned is the address of the string.

    So on to your questions:

    1) As you can see now from the above explanation the string literals have not been changed. You have merely changed the address contained by 'mainW'. The original string literal you assigned as an initial value is alive and well (albeit quite lonely now that nobody points to him anymore).

    2) By the same reasoning as #1, you have not actually replaced any data in the string literals. The compiler went through your code at compile time, found all of your string literals, tucked them away, and then used their addresses to refer to them. In other words, they all exist in memory simultaneously and you are just changing which one 'mainW' points to.

    3) See above. The one difference here is that 100 characters are allocated even if the string literal is shorter. This syntax basically announces your intent to write some more data (up to 100 chars) into that space and forces the compiler to reserve it.

    ReplyDelete
  2. Let me attempt to describe 'mainW's journey in your example above.

    -- Line 20: --

    You wish to create 'mainW', a pointer to a char. The compiler has decided that 'mainW' should reside somewhere in the Misty Mountains, at an address called 'address 0xMMMMMMMM'.

    Moving to the RHS of the assignment, the compiler notices you gave it a string literal. It bundles up the characters "Variable in Main\0" and stores them wherever it damn well pleases. Let's call this spot 'address 0xAAAAAAAA'.

    Now for the assignment: The compiler dutifully bestows 'mainW' with 'address 0xAAAAAAAA' - the address of the first character of your string literal. In other words, the data at location '0xMMMMMMMM' is now '0xAAAAAAAA'

    -- Line 23: --

    We rejoin our hero 'mainW' as he passes into the jaws of 'MakeStringAlternate(char**)'. The compiler notes that you wish to expose the hero to mortal danger by passing the function the exact location of 'mainW's (&mainW == 0xMMMMMMMM, passing by reference) instead of copying 'mainW' and passing a clone (mainW == 0xAAAAAAAA, passing by value).

    -- Line 10: --

    The humble courier entrusted with the communication of 'mainW's address is the local variable (villager?) 'localW'. He is a pointer to a char pointer.

    -- Line 12: --

    'MakeStringAlternate' uses the dereference operator on 'localW' (how barbaric!), which enables him to use the address carried by 'localW' to mount a direct raid on '0xMMMMMMMM' - 'mainW's address.

    Long ago, the compiler (an omnipotent and impartial creator) noticed 'MakeStringAlternate' required a new beast to aid him in his fell deed. Thus, at compile time the compiler nestled the foul string "I have altered the variable in main\0" away at 'address 0xBBBBBBBB' so that it might lie in wait for the siege.

    Back in the present, the trap is sprung... but the cunning 'MakeStringAlternate' chooses not to destroy the hero. Instead, he cleverly supplants him with the beast of his own design. By directing the peasant pawn 'localW' to lead him to 'mainW's lair, he is able to replace the address held by 'mainW' ('0xAAAAAAAA') with the address of his minion, '0xBBBBBBBB'. The address 0xMMMMMMMM ('mainW') is now possessed by 0xBBBBBBBB.

    -- Line 13: --

    Even though it is totally redundant, 'MainStringAlternate' decides to ensure his beast is respected. He leaves a copy of the possessed hero on the stack before he exits.

    -- Line 23: --

    Back in main(), 'mainW' already holds the value 0xBBBBBBBB (because of all the unpleasant dereferencing that happened). However, a final (redundant) blow is dealt when the returned copy of 'mainW' is read off the stack and forcibly assigned into 'mainW' again (rectally, to be certain).

    -- Line 27: --

    Some time later, 'mainW' makes his first public appearance since the incident. It is then clear that his soul contains only '0xBBBBBBBB' - he is completely possessed by the beast.

    The printf() reporter discovers this fact and visits the address of the beast ('0xBBBBBBBB') to get his story. There he finds the words "I have altered the variable in main\0" and defaces the console with it so the world may know the beast's terrible secret.

    The sad consequence is that no one remembers the noble words of the ancients ("Variable in Main\0". They persist for eternity at 'address 0xAAAAAAAA' waiting for a new hero (or wayward traveler) to discover them again...

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete