Tuesday, December 9, 2008

Bobby Decker's Tree


I love both the Bobby Decker story (which I've heard before but it was great to hear it again) and the Clyde McPhatter and the Drifters / Joshua Held / Irving Berlin / Bing Crosby White Christmas song / animation (which I have also heard / seen before but I can't hear / see it too often.)

I am about to add DeeDee's wurds but I thought you might be interested in what my newly revised Binary Tree program does to the wurds.

Here are the unsorted individual wurds.

Here are the sorted individual wurds. When you see a left bracket followed by a number, this means the wurd occurred more than once.

Here's the source code to the tiny-c program. I don't expect you to follow it but I would like to point out that the new features are it can handle up to 800 words and it can get its input from a file instead of via manual entry. These enhancements were extremely interesting to figure out how to do.

12 comments:

Tobee 'n DeeDee said...

So what does one do with source code? Is it used to install something? Doesn't it have to be compiled or something like that?

Tinyc Tim said...

"Source code" is text written by a programmer to "tell a computer what to do." It is either "compiled" or "interpreted." These two things have a lot in common but they also differ in several important ways. Programs written in tiny-c are "interpreted." What you must do to use this source code in tiny-c's case is run an "interpreter" program and then tell it the name of the source code file that holds your program. The "interpreter" program itself is written in C and compiled by a C compiler.

You actually have some experience with this process. Although you don't use it any more, the early versions of Toby Speak were written in tiny-c. The source code file was named tobee.tc and the interpreter is named tinyc.exe. Here's a sample run.

lee@lee-desktop:~/tinyc-ed$ ./tinyc

tiny-c/PC Interpreter Version Linux-01-01
Copyright (c) 1984 by Scott B. Guthery
Implemented 12/6/8 by Lee Bradley / Ed Davis

tiny-c Shell - 12/6/8

tc>.r tobee.tc
7411
0 277 7411 82589
tc>.tobee

Toby Speak Version 1.0.5.g

Enter name of file to translate and, optionally, verbose.

filename.typ [verbose] sourceco.de

Loading dictionary ...

Dictionary loaded!

Toby Rules!

"Source code" is text written by a programmer to "tell a computer what to do." It is either "compiled" or "interpreted." These two things have a lot in common but they also differ in several important ways. Programs written in tiny-c are "interpreted." What you must do to use this source code in tiny-c's case is run an "interpreter" program and then tell it the name of the source code file that holds your program. The "interpreter" program itself is written in C and compiled by a C compiler.

[translation display omitted]

Please be patient while I translate ...
Using 1339 byte dictionary dec08.dic ...

Tobee Rewellz!

"Source code" iz text written by a programmer tah "teyell a compooter wat tah do." Id iz eder "compiled" or "interpreted." Deez 2 dings hav a lot in comin butz dey alzo differ in several important ways. Programs written in tiny-c r "interpreted." Wat u must dew tah yews dais source code in tiny-c's case iz run an "indranzlatur" prograym and denn teyell it da nayme ov da source code file dat holds yur prograym. Da "indranzlatur" prograym idzelf iz written in C and compiled by a C compiler.

You actually hav summ expeerienz wid dais process. Aldough u don't yews id any mowa, da early versions ov Tobee Speek wur written in tiny-c. Da source code file wuz named tobee.tc and da indranzlatur iz named tinyc.exe. Here's a sample run.

Press Enter ...

The dictionary has 1323 words!

See sourceco.tbo and sourceco.htm for output.

Press Enter ...

lee@lee-desktop:~/tinyc-ed$

Tobee 'n DeeDee said...

So what commands would I use to run the indranzlatur (which is tinyc.exe?) I furget.

Tinyc Tim said...

To get you current with the latest version of the tiny-c version of Toby Speak, download

http://primepuzzle.com/tc/tobee.zip

and unzip it into the same directory you keep your tspeek.exe in.

Then, to run the tiny-c version of Toby Speak, go to your DOS prompt and type

c:\tc>tinyc.exe tobee.ipl

(This example is done on my laptop and the name of the DOS directory happens to be c:\tc)

Notice the important "command tail" of tobee.ipl. This is a special form of a tiny-c program that will cause the specific application of Toby Speak to automatically run.

Note: Don't forget to run ANSI.COM before issuing the above command. This is needed to get the color and cursor positioning display to work properly.

You will see the following display. When asked for the name of a file to translate, supply your own, not the one below (which is sourceco.de, an admittedly weird file name).

tiny-c/PC Interpreter Version PC-01-06
Copyright (c) 1984 by Scott B. Guthery
Implemented 12/7/8 by Lee Bradley / Ed Davis

Toby Speak Version 1.0.5.e

Enter name of file to translate and, optionally, verbose.
filename.typ [verbose] sourceco.de

You'll then have to hit Enter for each dictionary file that comes up.

This may all be of academic interest only as the tspeek.exe translator is much much faster. Plus, the tiny-c Toby Speak translator does not have an untranslate feature. But it all does show that one can write an application as complicated as Toby Speak in tiny-c.

So ... you are encouraged to translate stuff with any one of the now three different methods - the tiny-c one (discussed above), the compiled C one (tspeek) and the JavaScript one (jspeek.html).

TcT

Chip Bradley said...

Max here. Can you explain what the binary tree program itself actually does to words and why that is important? I'm not sure what the sorting of the list of words is actually telling us.

I made a screen shot of the sorted output of your INCREDIBLY COMPLEX program (!!!) and thought you might be able to explain that screen shot below:

http://www.charleybradley.com/s.jpg

Chip Bradley said...

I still have trouble with getting a link in a comment to be active. Can you explain how to do it for the previous comment? Tanks

Tinyc Tim said...

First, the way to get links to images to work in a comment (in a blog) is to write them like so

<a href=http://blahblah.com/booboo.jpg>http://blahblah.com/booboo.jpg</a>

(where, to compose this comment, I had to use the "trick" of typing &lt; (the "html character" for the "less than" symbol) and &gt; (the "html character" for the "greater than" symbol) instead of < and > to prevent the above from actually being a link!)

So ... your little image would go like this:

http://www.charleybradley.com/s.jpg

Now, to answer your real question ("what does your program do to words and why is it important?")

Good question.

The short answers are

1. it builds a "binary tree" from the words that can then be "climbed" by following "left" and "right" "pointers" to give you the words in sorted order. Here's a picture:

http://primepuzzle.com/tc/tree.html

The above link is to the article by the guy that wrote the original program. Don't spend too much time reading it. It's pretty complicated. Just look at the pictures and the diagram of the "tree" and skim the article.

2. Why one would want to do this is another matter! One reason might be to study "word frequency." It's interesting (to me) to see how often we use words when we write or speak. Look at how many times the "word" "Ize" was used in the Bobby Decker story (as translated). No fewer than 31 times!

278 Ize [106
304 Ize [118
325 Ize [128
329 Ize [130
348 Ize [141
355 Ize [147
65 Ize [15
388 Ize [161
433 Ize [196
19 Ize [2
72 Ize [20
441 Ize [201
448 Ize [207
485 Ize [228
493 Ize [232
506 Ize [238
80 Ize [24
518 Ize [248
88 Ize [28
598 Ize [298
619 Ize [310
645 Ize [322
716 Ize [365
110 Ize [37
744 Ize [385
24 Ize [4
136 Ize [50
182 Ize [64
184 Ize [65
262 Ize [99

The numbers on the left, by the way, tell you the position in the original text where that particular occurence occured. The numbers on the right (the ones that are preceded by left brackets) tell you how many duplicate words had been encountered when this word was encountered.

If you are still wondering why one would want to sort words that occur in stories like Bobby's, it's not unlike that proverbial mountain; because it's something that's in front of you and you wonder how you could pull it off!

It turns out, sorting is one of the most important and complicated things computers do. Think about it. If I gave you 10 words and asked you to sort them alphabetically, how would you go about it?! There are literally hundreds of different approaches to this problem.

Chip Bradley said...

Thanks for telling be about the binary tree program (written in tiny-c). I have been busy today working on my painting among other things, so I didn't take a look at your comments (about tree.tc) OR your helpful comments on how to get links to work within comments until later today.

First, to see if I understand your "comment-linking" concepts, I am going to place this tropical scene

http://charleybradley.com/my-snaps/20042008/2008/images/dec/snap2314.jpg

with a high degree of sunshine and this movie of the first measurable snow in Marborrow

http://charleybradley.com/my-snaps/20042008/2008/images/dec/mvi_4255.avi

here in THIS comment to see if they both become active links.

Wow. I just previewed the comment and it appears "they" accepted my syntax this time. It MAY work, assuming I have keyed in the right targets.

Thanks for "activating" my s.jpg link in my previous comment, too

Now, I have also taken a closer look at what tree.tc does and I think I understand not only the reason for the sorting process but also some basic stuff (about tiny-c) itself. I actually read the 2-page Binary Tree article and began to understand the power of tiny-c (at least the basics of it) enough to know that it uses a set of tools (functions, with arguments, and assigned variables [local and global, etc] to name a few) and that the tree.tc program is one example of how to sort and find things. I see the difference between "key records" each of which is unique and "hit numbers" which are (?) the number of times a word has been found UP TO that particular key record.

Correct me if I am wrong here...
I think this is generally right.

I notice that there is an incredibly tiny link at the very top of the scanned page with the Binary Tree article. It lead to your example of listing all the Red Sox players using the tree.tc tiny-c program. I looked at it and it appears there are several commands you "inserted" such that you can find text, insert new text, print text, etc.

I can see now how this tree.tc program would be helpful to study the Tobee dictionaries.

Thanks for answering my questions. I too like to sort things so I can zee why these mountains seem fun (and useful) to "sort" through.

I hope my links work in this comet.

Max

Tinyc Tim said...

One extremely tiny correction (no pun intended). It's not exactly

"hit numbers" which are (?) the number of times a word has been found UP TO that particular key record.

Every time a word is found that has already been found, a "global dup counter" is bumped up by 1 and this new number (prefixed by a space and a left bracket) is concatenated onto the "record" for the word.

Say the words came in like so:

foo
bar
foo
foo
bar

The tree report would look like so:

(Instead of trying to manually figure this out, I'm going to use the program. One minute please ...)

Key Text

2 bar
5 bar [3
1 foo
3 foo [1
4 foo [2

Study the above. Remember, the Key numbers on the left show the place in the original file that the word was found. The middle column is (obviously) the list of words, sorted. The word "bar" is alphabetically less than the word "foo." Duplicates are detected three times. The bracketed numbers on the right simply reflect the value of that "global dup counter" when the duplicate was detected.

I don't know if this clarifies or confuses. But just wanted to give it a shot.

Tinyc Tim said...

As for your mastering the art of placing hyperlinked images etc. into blog comments, you are now an expert.

I really like your sunshine in the tropics thing. It's cool (hot?!) cuz it combines a real background with an abstract, non-real Adobe-created (?) bright, fanciful graphic. Classic Bradley.

The movie clip was really neat too.

Ain't da indernet awesome?

Chip Bradley said...

You nailed it. I understand this. But now, to see if I fully understand (well almost fully) I am going to attempt to MANUALLY give you a list of 10 words and supply you with TWO (hopefully identicull answers for how they would be sorted by your program.

1. answer one will be my manual attempt at sorting a list of words containing some different duplicate words

2. The other will be (hopefully) the same list generated by using the tree.tc program -- which I tink I can locate.

Stay tooned.

Chip Bradley said...

Well, I am only half way there since I cannot seem to run tree.tc even though I have found it. I tried running ansi.con, then tinyc.exe, then typing .r tree, then tree

But all that happened was a return to the tc> prompt.

But I did make a list of random words before all that and it looked like this. There are intended duplicates AND it was 29 words long. Not 10 as I planned originally. I then sorted this list by cheating using a microsoft spreadsheet program. The original made up list looked like this:

brag
love
compassion
compassion
tree
china
bilge
china
understand
power
create
create
compassion
infiltrate
Openheimer
lake
truncate
is
junk
lake
read
classic
tantalize
love
humorous
power
create
hill
tree

That's 29 random words - unsorted.

I then came up with the following for my manually generated "TREE" list. It took about 10 minutes using a variety of edits to each of 7 spreadsheets. The end result looked like this:

7 bilge
1 brag
8 china [2
22 classic
13 compassion [3
27 create [3
28 hill
25 humorous
14 infiltrate
18 is
19 junk
20 lake [2
2 love
24 love [2
15 Openheimer
26 power [2
21 read
23 tantalize
29 tree [2
17 truncate
9 understand

Is this what tree.tc would have generated?