Posts tagged confusion

1 little 2 little 3 little endians… er wait, is it 3 little 2 little 1 little endians?

Well, it depends on what encoding you’re using.

Lately I’ve been in “endian-ness” hell on a project I’m working on. Well, not really a “hell”, more of a hurdle that can easily be overcome. A friend of mine recently asked me the difference between the two binary encoding strategies and when I went to explain it to him (via cell phone txt messaging) I quickly realized that I will run up both our bills pretty quickly by doing so. So here’s the quick rundown, and an example as it applies to the project I am working on.

Big Endian means that value is stored “from left to right”. The most significant values (bytes) are stored in memory with the lowest address first. This is probably what many people are familiar with, especially those in the networking field.  Here’s an example of data stored as Big Endian:

Take for instance the 2-byte value 1325, equivalent to a short or unsigned short. Stored in Big Endian encoding, the value would look like this in binary: 00000101 00101101. Makes sense right?

Now here is the same 2-byte value, 1325, in Little Endian encoding: 00101101 00000101. The bytes are flipped and thus to make any sense of it, you must read the last set of bits first.

So for simplicity’s sake, Big Endian values are stored like so:

<byte1> <byte2> <byte3> <byte4> ... <byten>

Little Endian values are stored like so:

<byten> ... <byte4> <byte3> <byte2> <byte1>

Network devices use Big Endian, it is considered the network standard, or “network encoding”. Endian-ness varies from CPU to CPU depending on its architecture. Power Macs use Big Endian, network devices use Big Endian. Intel PC’s (and Intel Macs) use Little Endian. What does that mean? Well, it means that if a program running on a Power Mac sends data to a PC, the PC may not understand it correctly without some type of encoding change taking place.

In my situation, I’m reading a series of values off of the network, all coming in via a byte stream. So the client will send the following packet for example:

4 bytes, 2 bytes, 1 byte, … and some others but I won’t get into them

So when I read the first four bytes off the network (which uses Big Endian encoding, remember?) I must flip the entire set of 4 bytes before my Little Endian PC can understand it.  Same goes for the next 2 bytes. The single byte obviously doesn’t need to be flipped with anything, it stands alone. So an incoming set of data may look like:

0x1A 0x2B 0x3C 0x4D 0x2A 0x2B 0xAD

Now I must flip each set of values byte by byte (but not invert the entire thing; this is a packet, containing multiple values).

In Little Endian, the same set of data reads:

0x4D 0x3C 0x2B 0x1A 0x2B 0x2A 0xAD

Confused yet? You’ll figure it out. Here’s some reading that should help: http://en.wikipedia.org/wiki/Endianness. It even gets into the origins of the word, which is pretty interesting. Here’s a little tool that may help you understand the pattern: http://dlnova.com/reverseendian.htm

Cross-Tabulation (Pivot Tables) with MySQL

I don’t really have any sections on my website related to MySQL, so I figured I’d just throw together a quick post about something I found incredibly helpful. I’m sure others will find it nice too.

So in my years being trained on data architecture and database design, I learned to "normalize, normalize, normalize". Unfortunately, normalization doesn’t lend itself to very simple statistical calculations–such as those used to generate graphs or a statistical scorecard.

In the case of Angel, I’ve normalized the hell out of the data. Now it comes down to needing to report on that data. At first, my reporting attempts were terrible–tons of nested loops, many queries, etc. Well, when I spoke to a colleague of mine at work, he said that if I was running on an SAS Mainframe, the work would already be done for me using a function called "PROC TABULATE". He sort of dated himself by saying that.

Then I broke the news to him: I’m not using SAS, or any other mainframe technology. He replied by suggesting I look into some sort of MySQL equivalent to that function. So I started hunting. I came across this very nice article: http://dev.mysql.com/tech-resources/articles/wizard/print_version.html

That article shows, very effectively, how to create a nice "statistical" report using cross-tabulation techniques with normalized source data. The queries are sort of monstrous if you have a large number of fields to report on, but at that point, you’d be using code to generate the parameters for the SELECT statement.

Here’s my example:

My source data looks like the following–please note that some results are truncated because the result set would be WAY too large for a blog:

mysql> select uid, Email from players;
+-----+--------------------------+
| uid | Email                    |
+-----+--------------------------+
|  10 | brian@venomgamelabs.com  |
|   4 | cale@caledunlap.com      |
|   1 | cale@venomgamelabs.com   |
|   9 | daniel@venomgamelabs.com |
|  11 | dave@venomgamelabs.com   |
|   5 | jane@venomgamelabs.com   |
|   8 | janes@venomgamelabs.com  |
|  12 | jill@venomgamelabs.com   |
|   6 | john@venomgamelabs.com   |
|   7 | johns@venomgamelabs.com  |
+-----+--------------------------+
mysql> select * from gamedata LIMIT 10;
+----------+---------+------+
| playerid | fieldid | data |
+----------+---------+------+
|        1 |       3 | 60   |
|        1 |       4 | 1    |
|        1 |       5 | 1    |
|        1 |       6 | 286  |
|        1 |       7 | 1    |
|        1 |       8 | 13   |
|        1 |       9 | 1    |
|        1 |      10 | 1    |
|        1 |      11 | 261  |
|        1 |      12 | 13   |
+----------+---------+------+

mysql> select FieldID, FriendlyName, GameID, FieldTypeID FROM gamefields;
+---------+---------------------------------+--------+-------------+
| FieldID | FriendlyName                    | GameID | FieldTypeID |
+---------+---------------------------------+--------+-------------+
|       1 | Play Time                       |      1 |           2 |
|       2 | Last Character                  |      1 |           1 |
|       3 | Total Eggs                      |      4 |           2 |
|       4 | Total Playtime                  |      4 |           2 |
|       5 | Total King Of The Hill Captures |      4 |           2 |
|       6 | Total Kills                     |      4 |           2 |
|       7 | Total Deaths                    |      4 |           2 |
|       8 | Engineer Playtime               |      4 |           2 |
|      10 | Mongo Playtime                  |      4 |           2 |
|      11 | Grunt Playtime                  |      4 |           2 |
|      12 | Gladiator Playtime              |      4 |           2 |
|      13 | Total Powerup Uses              |      4 |           2 |
|      14 | Total Powerups Received         |      4 |           2 |
+---------+---------------------------------+--------+-------------+
mysql> select typeid, typename from field_types;
+--------+------------------+
| typeid | typename         |
+--------+------------------+
|      1 | Text             |
|      2 | Numeric          |
|      3 | Game Coordinates |
|      4 | Boolean          |
|      5 | Physical Address |
|      6 | IP Address       |
+--------+------------------+

What I want to know from that data is how many players have field values within specific ranges–to generate some input data for a chart output. The code I’m using does the following math (from a separate query):

  1. Find the minimum value out of all of the player data for each field (SQL Query)
  2. Find the maximum value out of all of the player data for each field (Same SQL query)
  3. Get a "standard deviation" between the number $StandardDeviation = floor( ($High - $Low)/NUM_DEVIATION_RANGES );
  4. Begins a SQL query and assembles a list of parameters to select (based on the deviation ranges)
    for $i = 0 to NUM_DEVIATION_RANGES
    $Min = $Low + ($StandardDeviation * $i)
    $Max = ($Min + $StandardDeviation) – 1;
    $Query .= “SUM( IF( CAST(gamedata.Data AS UNSIGNED) >= $Min AND CAST(gamedata.Data AS UNSIGNED) <= $Max, 1, 0) ) AS ‘$Min-$Max’,”;
    end for

  5. Runs the query, obtains a single result per field and feeds it to the chart to render a per-field graphical analysis chart.

The output of the query from step 4 is as follows:

SELECT
    SUM( IF( CAST(gamedata.Data AS UNSIGNED) >= 60 AND CAST(gamedata.Data AS UNSIGNED) <= 87,1,0) ) AS '60-87',
    SUM( IF( CAST(gamedata.Data AS UNSIGNED) >= 88 AND CAST(gamedata.Data AS UNSIGNED) <= 115,1,0) ) AS '88-115',
    SUM( IF( CAST(gamedata.Data AS UNSIGNED) >= 116 AND CAST(gamedata.Data AS UNSIGNED) <= 143,1,0) ) AS '116-143',
    SUM( IF( CAST(gamedata.Data AS UNSIGNED) >= 144 AND CAST(gamedata.Data AS UNSIGNED) <= 171,1,0) ) AS '144-171',
    SUM( IF( CAST(gamedata.Data AS UNSIGNED) >= 172 AND CAST(gamedata.Data AS UNSIGNED) <= 199,1,0) ) AS '172-199',
    SUM( IF( CAST(gamedata.Data AS UNSIGNED) >= 200 AND CAST(gamedata.Data AS UNSIGNED) <= 227,1,0) ) AS '200-227',
    SUM( IF( CAST(gamedata.Data AS UNSIGNED) >= 228 AND CAST(gamedata.Data AS UNSIGNED) <= 255,1,0) ) AS '228-255',
    SUM( IF( CAST(gamedata.Data AS UNSIGNED) >= 256 AND CAST(gamedata.Data AS UNSIGNED) <= 283,1,0) ) AS '256-283',
    SUM( IF( CAST(gamedata.Data AS UNSIGNED) >= 284 AND CAST(gamedata.Data AS UNSIGNED) <= 311,1,0) ) AS '284-311',
    SUM( IF( CAST(gamedata.Data AS UNSIGNED) >= 312 AND CAST(gamedata.Data AS UNSIGNED) <= 339,1,0) ) AS '312-339'
FROM gamedata, gamefields, field_types
WHERE gamedata.fieldid = gamefields.FieldID
    AND gamefields.FieldTypeID = field_types.typeid
    AND field_types.typename = 'Numeric'
    AND gamefields.FriendlyName = 'Total Eggs'

Now that query runs once per number of fields requested. The above query was just a single query for a single field. Note how the code generated the SELECT statement parameters (min and max) and set them as the column header in the result?

The result of that query looks like this:

+-------+--------+---------+---------+---------+---------+---------+---------+---------+---------+
| 60-87 | 88-115 | 116-143 | 144-171 | 172-199 | 200-227 | 228-255 | 256-283 | 284-311 | 312-339 |
+-------+--------+---------+---------+---------+---------+---------+---------+---------+---------+
|     2	|      0 |	 2 |	   0 |	     1 |      0	 |	2  |	   0 |	     1 |       0 |
+-------+--------+---------+---------+---------+---------+---------+---------+---------+---------+

So the output is exactly what I need for my chart. The result fields represent the X axis of the chart, and the result values represent the Y axis of the chart, all from a single query against normalized data using cross-tabulation.

I urge you to click on the link I mentioned near the top of the article if you want to learn how to do this. My example is fairly specific to my situation. That article showed me exactly what I needed to do, but I needed to apply it to my data and my reports.  You’ll notice that I’ve used CAST’s in my query. The reason being that in order to accept any type of data, I’ve used the ‘TEXT’ type on my game data fields–that’s why I also have a type table and a type ID in the field types. I then query what types of fields is numeric, and only perform my calculations against them.

Doing cross-tabluation/pivot tables isn’t too bad once you know what you’re doing. I had a tough time wrapping my head around it, but after a few hours of tinkering, I finally got what I was looking for. Another thing to note is that I’m summing up 1’s and 0’s, basically a tally. If you want to sum up your real data, don’t use 1’s and 0’s; use your actual data field.

Get Adobe Flash playerPlugin by wpburn.com wordpress themes