Wednesday 11 April 2012

Base96 Sorting

This is actually a  bit of a cheat, in that the encoded string is transferred to a temporary array for sorting.

This is to simplify the moving of data and avoid chopping and rejoining strings, which can be a bit messy of not done carefully.

It also makes the code much easier to follow as an array is easier to picture when working through the code.

The main function is called Base96Sort().  It takes an encoded base96 string as its input parameter and returns a base96 string as its result.  The string is not decoded during the process

function Base96Sort( thisString$ )
   thisLen = len(thisString$)
   if thisLen > 2 and (thisLen && 1) = 0
      thisArraySize = thisLen / 2
      dim tempArray[ thisArraySize ] as string
      for i=1 to thisArraySize
         tempArray[ i ] = mid( thisString$ , i*2 - 1 , 2)
      next i
      SortTempArray( thisArraySize )
      thisString$ = ""
      for i=1 to thisArraySize
         thisString$ = thisString$ + tempArray[ i ]
      next i
      dim tempArray[0]
   endif
endfunction thisString$

First it gets the length of the string and stores it in thisLen.  This is then checked in two ways, it must be longer than two characters otherwise there is no work to be done, and it must be an even size as each coded Id is 2 characters long.

The even check is done using a binary AND operation with 1 - indicated by && 1

In binary (base 2), each digit  represents a power of 2, the lowest four are 8 , 4 , 2 and 1

A Binary AND operation requires the individual binary digits (bits) of both number to be a 1 for the result to be 1.   For example for the operation 11 && 13, the result would be 9.

1011 (11 = 8 + 2 + 1)
1101 (13 = 8 + 4 + 1)
---- ----
1001  (9 = 8 + 1)

So to check if a number is odd, you perform the same operation using 1 (which is 0001).  All odd numbers have a 1 in the right hand column, so an odd number will return a 1, and even number a zero.

If both checks pass, the routine continues and the length is divided by 2 and stored in thisArraySize, since each encoded ID is 2 characters, the number of characters is the length / 2.

An array called tempArray[] is then dimmed to that size as type string.  A for/next loop then cycles through this array moving bits of the string into each position, 2 characters at a time using the mid() command.

It then calls another function SortTempArray() telling it how big the array is.  This is the actual sort algorithm which sorts  tempArray[].

The thisString$ is cleared and a second for/next loop, then moves the coded IDs back from the array, adding them to the  thisString$ one at a time in order.

A Final dim command is then used to clear the tempArray[].

This is the SortTempArray() function.

function SortTempArray( thisSize )
   for i=1 to thisSize - 1
      thisLow = i
      for j = i + 1 to thisSize
         if tempArray[j] < tempArray[thisLow] then thisLow = j
      next j
      if thisLow <> i
         temp$ = tempArray[i]
         tempArray[i] = tempArray[thisLow]
         tempArray[thisLow] = temp$
      endif
   next i
endfunction

Which sorts the previously filled array.

To test this, add the following code just before the do command - it only needs to be run once.

thisSrc$ = ""
for i=1 to 10
   thisValue = random( 0 , 9215 )
   thisSrc$ = thisSrc$ + Base96Encode( thisValue )
next i
thisSort$ = Base96Sort( thisSrc$ )

The random() command generates a random value in the range 0 to 9215, which is stored in thisValue.  this is then encoded to Base96 and the result added to a string called thisSrc$. A total of ten ID's are created this way.

The encoded string is then passed to the Base96Sort() function and the result stored in thisSort$.

The following code is used to print the contents of both strings, this goes at the end of the print commands in the do/loop.

print("")
print(chr(34)+ thisSrc$ + chr(34))
print(chr(34)+ thisSort$ + chr(34))
print("")
for i=1 to 10
   thisCodeSrc = Base96Decode( mid( thisSrc$ , i*2 - 1 , 2 ) )
   thisCodeDst = Base96Decode( mid( thisSort$ , i*2 - 1 , 2 ) )
   printc( right( spaces(1) + str( i ), 2) + " : ")
   printc( right( spaces(3) + str( thisCodeSrc ), 4) + " : ")
   print( right( spaces(3) + str( thisCodeDst ), 4))
next i

The first and fourth line simply print a blank line to space the output a little.  Lines 2 and 3 print the contents of the unsorted and sorted strings respectively.

Within the for/next loop, each ID is decoded in turn, the one from the unsorted string is passed to thisCodeSrc, the one from the sorted string to thisCodeDst, The next lines then print in turn; the counter, the unsorted value and the sorted value for each position in the string.

The values are then padded for output - to make them the same length on each line - keeping them right aligned, as follows;
  • the spaces(), command outputs the indicated number of spaces
  • the str() command converts the value to a string
  • the right() command outputs the right most characters from the above two combined
So each value of i is output as 2 characters, each of the others is output as 4 characters.  The spaces provide the additional characters where necessary at the start of the string version of the values.

Under the two encoded string - unsorted and sorted, this produces three columns of numbers, the left column is the count, the next column is the unsorted list and the right most column is the sorted list.
Here's a close up.
So for example, the "T6" at the end of the upper coded string represents the value 5014 - the last entry in the unsorted column.  T6 also appears at characters 5 and 6 of the lower coded string, which corresponds to position 3 in the sorted column.

No comments:

Post a Comment