Chris Learns C

Tuesday, January 17, 2012

Python types are killing me

I spent probably two hours to get to this point:

All I wanted to do was import a .csv file as a dict type (values accessible by keys) and then be able to selectively choose which ones to do work on.

That didn't make sense, let's see if I can better explain it.

My data is a spreadsheet exported to .csv. Each row is a set of data with different things I need record of, like a date, name, notes, etc. Each column has a title, and DictReader sees this and makes those titles the keys.

So, if my .csv file looks like this

name, age
chris, 29
bob, 42

the dicts look like:

{'name':'chris', 'age':29}
{'name':'bob','age':42}

So the lets say I have a lot of these records and only want to do something with the people that have ages of 29. The above code will do that for me (just replace "print row" with whatever I really want to do).

It took me forever to get to this point because I'm still not used to handling data in Python. I thought that "reader" was a 2d array of dicts, but it's not. I have to by "Pythony" about things.

Dicts, lists, and tuples. I'm having a hard time getting my head around when to use each. Lets say I had a whole lot of data with lots of ages and I wanted to shuttle all the people of a certain age into their own variable for handling. What type is this variable? Is it a list of dicts? Is it something more akin what "reader" is? Is "reader" that csv.DictReader dumped out a list of dicts? The Python console says no - it's an instance of something. If it's not a data type then how am I able to iterate thought it with "for row in reader"?

Every time I jump into Python I wind up coming out with more questions than answers. I have trouble with Python that I never have with C. I will concede that when I do realize how to do something that it's fairly straightforward and only a few lines of code. That's nice. I might need to just surrender the need to know exactly what's going on and hope that this ignorance doesn't cause a massive bug that I can't track down due to not knowing the internals well enough.

Thursday, January 12, 2012

Tome de Python

I went ahead and ordered that 1100+ page Python book that O'Reilly publishes. There are some Things™ I want to do and I need to just dive in.

The internet is great, but nothing beats a book. Maybe one day I'll be comfortable enough with e-reader tech to not need the dead tree version, but that day hasn't come yet. It wasn't until recently I purchased a laptop - the tech just wasn't at a point I considered worthy of purchasing up until now. Maybe by the time the iPad 5 or Kindle Plasma Storm comes out I'll be ready.

Wednesday, January 11, 2012

Python CSV module DictReader

I'm tossing this link here as a reminder to take a look at it later.

http://www.doughellmann.com/PyMOTW/csv/

One of the reasons I stopped casually learning Python is because the documentation is incomprehensible to me. I wrote about this once before.

I'm hitting a problem at work that plagued me last year, and being able to pick stuff out of a .csv file (or an .xlsx file) would be really nice.

Tuesday, January 10, 2012

Statistics

A coworker let me borrow "Statistical Analysis: An Interdisciplinary Introduction to Univariate & Multivariate Methods" by Sam Kash Kachigan.

I'm going to be picking up some self-learning material for the office. If I like what I see with this book then maybe I won't need to get a larger stats text. We'll see. The other book I want is good linear algebra text book. I still have my calculus book from college.

So yea, hopefully I'll be able to round out my self-learning with some math. We'll see if it sticks as well as programming has.

Tuesday, January 3, 2012

Better linked list plan

I'm going to implement a singly linked list with a void pointer to some arbitrary data. The arbitrary data I want to try is a structure with a pointer to a character array and an integer to hold a count. I want to scan through a text file and count every unique word.

1) Take in a word.

2) See if it's already a known word in the list

2a) If it's not in the list, create a new node and set the counter to 1

2b) If it's in the list then set that node's counter +1

3) Print out the results

An advanced version of this would do something fancy about what order the words are stored in - the first implementation will no doubt be unordered and thus slow after the words go on. I'm not sure yet of the best way to store words such that scanning to find a word is fast. I somewhat recall a way of doing this using a tree structure, but that's after I've gotten more list experience under my belt.

Monday, January 2, 2012

Linked List

I have been trying to stick to this rule where I'll never use a pre-made implementation of an advanced programming concept without first creating my own version of it. The Objective-C stuff I've been working on has me frequently using the Foundation version of a linked list (NSArray and NSMutableArray). I've never created my own linked list implementation so this morning I sat down and hashed out the creation, adding of things, and deletion of a doubly linked list of character pointers. To make it complete I'll have to include insertion and deletion of nodes at any arbitrary point, but that shouldn't be too hard.

Here is what I came up with. My notes of what went right and wrong and some questions to address later are below.

DLinkedList.h

DLinkedList.c

main.c

What went right is that it seems to work. I'd like to add a list traversal function that prints out all the data in the node (node address, head pointer, tail pointer, and data) so that I can verify it's doing exactly what I think it's doing.

What went wrong is that I'm not sure if my AddWord() function is following the best practice for shuffling around variables. I'm always worried that I will treat pointers as special and not do the same thing I'd do if it was an integer. A new node being created in that function is always addressed "through" the node that preceded it. I should have created a temporary node pointer to hold the address of the newly created node and then assign the next/previous/data from that. The weirdness came from literally translating my hand-written notes to code.

What's left is to do the verification, stress test it, and add in some safety checks for all the malloc() calls. Then I can move on to adding in the insert and delete node functions and think of something clever to do with all of it.

I made the nodes rather specifically hold pointers to strings in memory. The next iteration of this needs to make it hold anything. I think I can do this by (in the event of strings) allocating the space, doing strcpy(), and then casting the pointer to void to store in the node. Getting the data back out means needing to re-cast, probably. I messed around with going to and from void a while back and I don't recall having any difficulty.

The list struct which holds the head and tail of the list is so that it's quick to find the last node. Otherwise I would have had to always have the last node's "next" pointer be NULL.

Right, so I think I can give this little side-project another run through in a few days and then I'll feel good about using fancier canned library versions of it!

string size

Here's a quick reminder for myself about getting the size of strings.