Wednesday 11 April 2012

Binary and Character Codes

the AGK character set works closely to the ASCII standard, which encodes each character as a single Byte (or eight bit number).

They are called 8 bit numbers because in binary which describes a value in terms of ones and zeros, the number uses 8 binary digits (or bits) - eight ones or zeros.  This gives a possible 256 combinations which are translated to values from 0 to 255.

A 32 Bit number uses 32 of these digits, so takes 4 times as much space but allows over 4 billion combinations.  Each additional binary digit doubles the number of permutations.

In the ASCII system, the first 32 numbers (from 0 to 31) are used for special purposes.  For example 13 is used as an carriage return, 10 is used as a line feed etc.

Number 32 is used for the space character and the remaining codes up to 127 describe the characters you are looking at now.  65 to 90 are for capital letters A to Z, 97 to 122 Are used for lower case letters a to z, the rest are the various symbols on the keyboard.

Characters 128 to 255  are called the extended character set and are not so standard.  They can vary from system to system.

We could convert any number into a string equivalent by breaking it into 8 bit lumps, so a 32 bit number would take up 4 characters.  But we don't need that many so there is no point using that many.

We could limit our objects to 256 and store them in just one character, but ideally we want to be able to go higher than this so we need to use more than one.

Two characters would allow 65,536 which is way too many.

There is however a middle ground  - we only use part of the range.

If we use characters 32 to 127 - the standard visible characters, we get 96 combinations.  If we use two characters, we get 96 X 96 or 9,216 combinations and the characters we get from this are always visible and always standard.

So value 0 would be encoded as two character 32s or two spaces, value 1 would be a space and a ! symbol, value 2 would be a space and a " symbol etc.

Since these are normal characters, we can use string comparisons to see which is larger, which is smaller and which is the same.  And during the testing phase, we can print the characters and see that they are what they are supposed to be.

This method of encoding is called Base96 - because it gives 96 combinations per digit.

This is how we will encode our objects within the recipes, using two base96 digits encoded as characters.

No comments:

Post a Comment