%WORD DIAGONALIZATION WORD CLUSTERS VIRTUES VICES 080102 Word-Diagonalization of Word-Clusters and Essays can be achieved in the following way --- to focus upon contrasting THEMES. It may be helpful to emphasize THEMES rather than VIRTUES and VICES; because different people are likely to have differences of opinion regarding whether or not a given THEME deals with a VIRTUE or a VICE. Collect similar words which start with the same 3, 4 or 5 letters; e.g. Vice, Vicious, Victim, Virtue, Virtuous, Virtually Honest, Honesty, Honestly Integrity, Integration, Integrations, Integrities Integral Alienation, Alien, Aliens, Alienated, Alienative Excommunicated, Excommunication, Excommunicator Estranged, Estrangement, Estrange, Estranging Evil, Devil, Demon, Demons Cooperation, Cooperated, Collaboration, Collaborated Reconciled, Reconciliation, Reconciler, Reconciling Peace, Peacemaker, Peaceful War, Warring, Wars Attack, Attacked, Attacker, Attacking Make an alphabetized list of all such similar words where the emphasis, but not an exclusive emphasis, is on the first few letters being alike or similar. Allow some words to appear in two clusters, and some words to begin with other letters if they are synonymous in some significant way. Find those words primarily in terms of the beginning 3, 4, 5 letters. Augment such clusters manually --- to add closely related other words. Then find for each essay how many times the words of each word cluster appear in each essay, by loading each essay into memory and then in turn counting in memory how many times each word from each cluster appears in memory and write to disk the count for the Signature- Word --- into a file named by that Signature-Word; writing to disk the essay name with the total word count for that essay name and sum-count for all words in the cluster of words named by the Signature-Word. total-count @yymmdd# in the path named: virtue total-count virtue in the path named: zyymmdd# virtue @yymmdd in the path named: z#### where the count has four digits represented here by four pound signs. Note that there are to be three records written to disk for each combination of (1) a Signature-Word, (2) a word-cluster and (3) each essay. If there are 60 word-clusters used and 9,000 essays there will be 3 x 50 x 9,000 = 1,350,000 records. NEXT: For each Signature-Word find its compatriots' COMBINED rate of appearance in each essay and enter numbers in a two-dimensional array with Signature-Words on one side and Essay-Names on the other side. Here and in what follows "Signature-Word" will reference the Signature-Word and its compatriots'! Step #2. Seek to diagonallize the two-dimensional array as in the addict/codependency matrix study done at Coe with the Astronomy and General Physics Classes in the 1970's. Step #3. Find the GLOBALLY most frequent Signature-Word in all 8,600 essays and the essay which contains that word the most times. Number that Signature-Word (with compatriots) as Signature-Word #1. (Signature-Words may be numbered with many different numbers in the following processes --- because there are more essays than signature words!) In the end Signature-Words (with compatriots) will be put in order according to the average of the assigned number for each Signature-Word. Essays themselves will be ordered according to the evolving process below, and will have similarities as to which Signature-Words appear in adjacent essays in the ordered set. Step #4. Find the essay containing the Signature-Word #1 (with compatriots) the second most frequently --- making the two essays significantly similar. Call that essay essay #2 in an evolving ordered-list-of-related- essays. Find the other Signature-Word which appears most frequently in this second essay; and call that Signature-Word #2 in an evolving-ordered-list-of- related-Signature-Words. Step #5. Find the as yet not chosen essay containing Signature-Word #2 (with compatriots) most frequently and call that Essay #3 in the evolving ordered-list-of-related-essays. Find the other Signature-Word which appears most frequently in Essay #3 and number that Signature-Word #3. in the evolving ordered-list-of-related-Signature-Words. Step $5. Find the as yet not chosen essay which contains the most recently chosen two signature words (with compatriots) the most times. Call that essay essay #4 in an evolving ordered list of essays. Step #6. Find the Signature-Word (and compatriots) which appear most frequently in essay #4. Call that Signature-Word #4 in the evolving ordered-list-of- related-Signature-Words. next most frequent signature-word APPEARING IN THE ESSAYS WHICH CONTAINED THE PREVIOUSLY MOST FREQUENT SIGNATURE-WORD; i.e., the essay most like the previously "noted-essay" and put it in second place in a listing of all essays. It will be notable for being like the previously selected essay in having two Signature-Words appearing with high frequency. Step #5. Find the most frequent secondary signature- word appearing in previously assigned essay and then the unassigned essay which has that new secondary signature-word appearing the most times. Assign that new essay to the next sequential place in the ordering of essays. And remember what its qualifying Signature-Word was. Step #6. Find the as yet unassigned essay which has the most recently used signature word appearances and assign it to the next sequential place in the ordering of essays. etc. After the first two Signature-Words have been determined, the third essay position might be assigned to the essay which has the most appearances of the previous two Signature-Words. Then the third Signature-Word might be the additional Signature-Word which appears most frequently in the two most recently assigned essays; and the third essay would then be the as yet unassigned additional essay in which that Signature-Word appears most frequently. It would be prudent then to keep the two most recent Signature-Words for further use. The next essay to be assigned would be the as yet unused new essay which has the two most recent Signature-Word appearing most frequently; and the next to be used Signature-Word would the the Signature-Word appearing most frequently in the essays which contained the previous two found Signature-Words weighted by the products of frequencies of Signature-Words. Then the task is to find the most frequently appearing Signature-Word and the essay which has that signature-word-member appearing most frequently. Then the task is to find the most frequently appearing Signature-Word not yet "used" which appears the last signature-word used; and the essay not yet used in which that next Signature-Words- compatriots appear most frequently; and write to disk that newly used Signature-Word and newly used most frequent essay. ==========================================================