Chris Learns C: January 2012

Saturday, January 28, 2012

clever

I've been experimenting with generating HTML for reports. It's pretty easy to do - writing a text file is more or less trivial in Python. Something I had struggled with is having large amounts of HTML text with placeholder variables. It gets unwieldy if you want to make little edits. I ran across this in the Python online documentation:

http://docs.python.org/howto/webservers.html#templates

I had been doing this in my code:

myvariable = "here"
htmltext = "imagine this is long html %s" % myvariable

which would make htmltext be "imagine this is long html here".

The issue is that my HTML is often super long (tables) and making small edits and making sure my variables are in line is annoying. The way they did this in the link is clever.

myvariable = "here"
htmltext = "imagine this is long html %s"
result = htmltext % myvariable

Now I can separate things out for readability. I should read the documentation more often!

Sunday, January 22, 2012

Other activities

I'm about 50 pages into this statistics book. So far I haven't hit anything I haven't seen before. This book does a fairly good job of describing why certain metrics are used, as well as describing ways they can be meaningless. That being said, it's not nearly as mathematically rigorous (so far) as I had expected. It's pages and pages of text with one or two punchline formulas defining the previously described behavior. I'm not going to knock it too hard yet - I guess I expected more math and I'm curious as to why it's not there. I flipped through the later chapters and found mostly the same pattern.

The one statistics course I took in college wasn't a "stats for engineers" course, but it still had a good deal of math involved. For some reason I was really good at it, and I think the reason is kind of silly. A typical formula for basic statistics (like the one for standard deviation) is kind of "pretty". It has a sigma and summation symbol (upper case sigma?) and I would mindlessly doodle it over and over again in various ways, so I wound up memorizing them accidentally. Over time I've forgotten it, naturally. Math retention has never been my strong suit - quite the contrary to the programming retention I seem to have. Maybe it's a function of practice, not innate ability.

I was figuring that statistics would be mathematically rigorous enough that it would prevent me from tackling anything else while researching it. That doesn't seem to be the case so I'm tempted to bring in something else to do. I haven't stopped programming. I've actually done more programming this new year than usual, but it's in Python and it's work related. It scratches the programming itch just as well, so I'm not jumping at tackling the next chapter in the Obj-C book. Hopefully that doesn't hurt me in the long run.

Saturday, January 21, 2012

C String Copying

http://byuu.org/articles/programming/strcpycat

Something on /r/programming I want to remember.

Wednesday, January 18, 2012

Good call, Zed.

I got about halfway through Learn Python the Hard Way before I got distracted. I intend to get back into it shortly - particularly for the web programming parts.

There is a chapter called "Advice From An Old Programmer" that I hadn't read until today. One excerpt from this is "Programming as a profession is only moderately interesting. It can be a good job, but you could make about the same money and be happier running a fast food joint. You're much better off using code as your secret weapon in another profession.". I put in bold a sentence that really hit me kind of hard.

I've treated programming as a hobby I might one day (five or ten years from now) turn into a career. Now that I think about it I realize it was a handy tool to have in college, in the lab I used to work in, and in the job I have now. Maybe I can leverage it as my "edge". Maybe everyone should. I've been perpetually bummed that I'm not a programmer or engineer - If I take Zed's advice to heart then maybe I won't have to be anymore.

Tuesday, January 17, 2012

The current Python project

So my goal with Python here is to take a spreadsheet I use at work to document support cases and output it into something more readable. The secondary goal is to extract some useful statistics out of it. The second goal was last year's primary goal which got me into Python in the first place, but I've done a lot of the bean counting in Excel itself (an altogether useful lesson but not journal-worthy).

A good start will be a monthly report. The exercise of only performing actions on specific dicts in the list that have a certain value bound to a key is essential - like what if I only wanted to view cases that are X days old, or are are in a particular region? I think I've solved that problem. The next big deal is to take the data in those dicts and format them into something presentable. I'm thinking initially HTML since I know how to do tables in that and in my security camera project I intermixed variables with static text. If I'm brave I might try converting the dicts to XML for the sake of learning XML, but we'll see. I'm dancing the fine line between hobby project and work.

Python types are killing me

I spent probably two hours to get to this point:

All I wanted to do was import a .csv file as a dict type (values accessible by keys) and then be able to selectively choose which ones to do work on.

That didn't make sense, let's see if I can better explain it.

My data is a spreadsheet exported to .csv. Each row is a set of data with different things I need record of, like a date, name, notes, etc. Each column has a title, and DictReader sees this and makes those titles the keys.

So, if my .csv file looks like this

name, age
chris, 29
bob, 42

the dicts look like:

{'name':'chris', 'age':29}
{'name':'bob','age':42}

So the lets say I have a lot of these records and only want to do something with the people that have ages of 29. The above code will do that for me (just replace "print row" with whatever I really want to do).

It took me forever to get to this point because I'm still not used to handling data in Python. I thought that "reader" was a 2d array of dicts, but it's not. I have to by "Pythony" about things.

Dicts, lists, and tuples. I'm having a hard time getting my head around when to use each. Lets say I had a whole lot of data with lots of ages and I wanted to shuttle all the people of a certain age into their own variable for handling. What type is this variable? Is it a list of dicts? Is it something more akin what "reader" is? Is "reader" that csv.DictReader dumped out a list of dicts? The Python console says no - it's an instance of something. If it's not a data type then how am I able to iterate thought it with "for row in reader"?

Every time I jump into Python I wind up coming out with more questions than answers. I have trouble with Python that I never have with C. I will concede that when I do realize how to do something that it's fairly straightforward and only a few lines of code. That's nice. I might need to just surrender the need to know exactly what's going on and hope that this ignorance doesn't cause a massive bug that I can't track down due to not knowing the internals well enough.

Thursday, January 12, 2012

Tome de Python

I went ahead and ordered that 1100+ page Python book that O'Reilly publishes. There are some Things™ I want to do and I need to just dive in.

The internet is great, but nothing beats a book. Maybe one day I'll be comfortable enough with e-reader tech to not need the dead tree version, but that day hasn't come yet. It wasn't until recently I purchased a laptop - the tech just wasn't at a point I considered worthy of purchasing up until now. Maybe by the time the iPad 5 or Kindle Plasma Storm comes out I'll be ready.

Wednesday, January 11, 2012

Python CSV module DictReader

I'm tossing this link here as a reminder to take a look at it later.

http://www.doughellmann.com/PyMOTW/csv/

One of the reasons I stopped casually learning Python is because the documentation is incomprehensible to me. I wrote about this once before.

I'm hitting a problem at work that plagued me last year, and being able to pick stuff out of a .csv file (or an .xlsx file) would be really nice.

Tuesday, January 10, 2012

Statistics

A coworker let me borrow "Statistical Analysis: An Interdisciplinary Introduction to Univariate & Multivariate Methods" by Sam Kash Kachigan.

I'm going to be picking up some self-learning material for the office. If I like what I see with this book then maybe I won't need to get a larger stats text. We'll see. The other book I want is good linear algebra text book. I still have my calculus book from college.

So yea, hopefully I'll be able to round out my self-learning with some math. We'll see if it sticks as well as programming has.

Tuesday, January 3, 2012

Better linked list plan

I'm going to implement a singly linked list with a void pointer to some arbitrary data. The arbitrary data I want to try is a structure with a pointer to a character array and an integer to hold a count. I want to scan through a text file and count every unique word.

1) Take in a word.

2) See if it's already a known word in the list

2a) If it's not in the list, create a new node and set the counter to 1

2b) If it's in the list then set that node's counter +1

3) Print out the results

An advanced version of this would do something fancy about what order the words are stored in - the first implementation will no doubt be unordered and thus slow after the words go on. I'm not sure yet of the best way to store words such that scanning to find a word is fast. I somewhat recall a way of doing this using a tree structure, but that's after I've gotten more list experience under my belt.

Monday, January 2, 2012

Linked List

I have been trying to stick to this rule where I'll never use a pre-made implementation of an advanced programming concept without first creating my own version of it. The Objective-C stuff I've been working on has me frequently using the Foundation version of a linked list (NSArray and NSMutableArray). I've never created my own linked list implementation so this morning I sat down and hashed out the creation, adding of things, and deletion of a doubly linked list of character pointers. To make it complete I'll have to include insertion and deletion of nodes at any arbitrary point, but that shouldn't be too hard.

Here is what I came up with. My notes of what went right and wrong and some questions to address later are below.

DLinkedList.h

DLinkedList.c

main.c

What went right is that it seems to work. I'd like to add a list traversal function that prints out all the data in the node (node address, head pointer, tail pointer, and data) so that I can verify it's doing exactly what I think it's doing.

What went wrong is that I'm not sure if my AddWord() function is following the best practice for shuffling around variables. I'm always worried that I will treat pointers as special and not do the same thing I'd do if it was an integer. A new node being created in that function is always addressed "through" the node that preceded it. I should have created a temporary node pointer to hold the address of the newly created node and then assign the next/previous/data from that. The weirdness came from literally translating my hand-written notes to code.

What's left is to do the verification, stress test it, and add in some safety checks for all the malloc() calls. Then I can move on to adding in the insert and delete node functions and think of something clever to do with all of it.

I made the nodes rather specifically hold pointers to strings in memory. The next iteration of this needs to make it hold anything. I think I can do this by (in the event of strings) allocating the space, doing strcpy(), and then casting the pointer to void to store in the node. Getting the data back out means needing to re-cast, probably. I messed around with going to and from void a while back and I don't recall having any difficulty.

The list struct which holds the head and tail of the list is so that it's quick to find the last node. Otherwise I would have had to always have the last node's "next" pointer be NULL.

Right, so I think I can give this little side-project another run through in a few days and then I'll feel good about using fancier canned library versions of it!

string size

Here's a quick reminder for myself about getting the size of strings.