Tuesday, December 14, 2010

more parameters

Since I don't have the ability to graph anything visually I should try to get out data that doesn't need it. Something like average inter-spike interval of units on a channel. This would require me to extract data, perform a calculation on it, and then present it. It wouldn't even be a lot of data - just unit classification (0 for unsorted, 1 for unit 1, 2 for unit 2, etc) and the corresponding timestamp. Should I put it in one big two dimensional array, or have an array for each unit? Would getting the average interval for each unit force me to run through that array once per unit? How could I make that more graceful? Perhaps I should be thankful that the units correspond to numbers. I could have a 27 element array (since there is a max of 26 units plus 0 for unsorted) and keep a running tab and using the unit classification as the index of the array I'm keeping the interval in.

Hmm...

1 comment:

  1. You have lots of options here and this exercise is good practice for making design decisions.

    As you mentioned, you could read all the data from the file into memory (your array) and then process that data in another loop. This would make sense if you had other uses for the data and doing so would avoid having to read the file again for those uses.

    Alternatively, if all you care about is the average, you don't need to store all the data records. Instead, save a running sum of intervals for each unit. After reading all N data records in the file, divide each running sum by N to get the average for that unit.

    Often it's helpful to characterize the computational complexity of what you want to do in order to better understand your options.

    For example, the simple average of N items requires (N-1) adds and 1 divide. For the first example above the overall algorithm requires:

    - 27*(N-1) adds
    - 27 divides
    - N*27 storage locations + 1 for the value of N
    - 2*27*N loop iterations to arrive at an average: one loop to read the data, and the other to post process the array and calculate the average.

    And, the second implementation reduces the storage and execution requirements to:

    - 27 storage locations + 1 for the value of N
    - 27*N loop iterations

    A fancy way to describe the concept of computational complexity is discussed here:

    http://en.wikipedia.org/wiki/Big_O_notation

    Finally, a note of caution about using data directly from user input without conditioning - this is almost always bad practice and often results in a security hole or worse. Some classic examples of why this is bad:

    http://en.wikipedia.org/wiki/Buffer_overflow
    http://en.wikipedia.org/wiki/SQL_injection

    The lesson is: Never allow input data to directly affect the execution of a program without proper conditioning.

    In your specific example, instead of trusting the data file to provide you with a unit number of '0-26' you would add a simple check:

    if(unit >=0 && unit <=26) {
    (do something...)
    } else {
    printf("Hey, that's not a kosher unit number!\n");
    }

    Otherwise, somewhere down the line someone will try to run your tool on SpikeFile v2.0 with eleventy-billion possible units and will encounter a segfault when the 'unit' variable indexes outside of the 0-26 elements of your array...

    ReplyDelete