Here are some hasty notes for those interested in modifying
these sequence programs.
First it might be useful to know which programs are which.
Most of the programs are simply the function name followed by the
extension .BAS, those which are not are listed bellow:
Function Key Function Program
F1 (first page) MENU MASTER.BAS
F7 TRANSLATE SLATE.BAS
F8 COMPARE COMP.BAS
F1 (second page) MENU PAGE2.BAS
F2 FANCY JQP.BAS
F5 SEARCH SRCH.BAS
F6 OVERLAP OVER.BAS
F10(either page) CONTROL CONTROL.BAS
Note that the programs are divided in three major classes:
those that do not use sequence data (MASTER, PAGE2, CONTROL,
PRINTER, TYPE), those that read sequence data from a file (ENTER,
EDIT, COMP, COMPII, OVER, READ), and those that use the sequence
data that is read by the program READ (PRINT, REST, SLATE, JQP,
PUPY, SRCH, USAGE). This may seem unnecessarily complicated,
however it was done in an attempt to circumvent the relative
slowness with which the IBM BASIC transfers data from one matrix
to another, or reads individual records from a file, and to take
advantage of the rapidity with which it processes string
variables. This leads to an additional complication because string
variables are limited in length to 255 characters. Therefore the
sequence data read by the function READ is stored in a one
dimensional matrix each element of which is a string variable
containing up to 250 characters: SEQ$(). An individual base, say
the Ith base, may be accessed using the rather unwieldy expression:
MID$(SEQ$((I-1)\250+1),(I-1) MOD 250+1,1)
In addition to SEQ$(), there are several other COMMON
variables which deserve mention.
LENGTH stores the length of a sequence read by the function
TITLE$ stores the title of a sequence read by the function READ
and is printed as a header on hard copy.
MAXL is the maximum allowed length of a sequence to be read by
the function READ, it is set by the function CONTROL.
DISK$ stores the name of the drive from which the sequence
data is to be read, it is set by the function CONTROL. A file
should be adressed as DISK$+":"+filename.
PRTFLG is a flag which is used to send output to the printer
when it is set to 1, or to the screen when it is reset to 0.
SCRFLG is a flag which is used to pause the output to the
screen when the page becomes full if it is reset to 0, or to allow
for a continuous scroll if it is set to 1.
Finally a word about REST. New enzymes can be added to the
list by adding additional data statements at the end of the
program. The format for each enzyme is as follows: a name of up to
seven characters, the position of the base in the recognition
sequence after which the enzyme cuts, the number of different
sequences which the enzyme recognizes, and then each of the
recognition sequences. For example the enzyme AsuI which
cuts the sequence G'GNCC is given by:
Here is an example program which finds the ten longest runs of
alternating purine and pyrimidine bases in a given sequence (ie
possible regions of Z-DNA) I call it PUPY.
500 PRINT TITLE$:IF PRTFLG THEN LPRINT TITLE$
510 PRINT:PRINT:PRINT "One moment please."
520 I=1:N=1:A$=MID$(SEQ$(1),1,1):IF A$="A" OR A$="G" THEN FLAG=0 ELSE FLAG=1
530 I=I+1:A$=MID$(SEQ$((I-1)\250+1),(I-1) MOD 250+1,1)
540 IF A$="A" OR A$="G" THEN IF FLAG=1 THEN FLAG=0:N=N+1:GOTO 530 ELSE GOTO 580
550 IF A$="T" OR A$="C" THEN IF FLAG=0 THEN FLAG=1:N=N+1:GOTO 530 ELSE GOTO 580
560 IF I<=LENGTH GOTO 530
580 FOR J=1 TO 10
590 IF N>M(J) THEN FOR K=10 TO J+1 STEP -1:M(K)=M(K-1):P(K)=P(K-1)::NEXT:M(J)=N:P(J)=I-N:GOTO 610
610 IF I<=LENGTH THEN N=1:GOTO 530
620 PRINT:PRINT:PRINT TAB(31);"THE TEN LONGEST RUNS"
625 IF PRTFLG THEN LPRINT:LPRINT:LPRINT TAB(31);"THE TEN LONGEST RUNS"
630 PRINT:PRINT TAB(34);"Base Length"
635 IF PRTFLG THEN LPRINT:LPRINT TAB(34);"Base Length"
640 FOR I=1 TO 10:PRINT TAB(34);:PRINT USING "#### ####";P(I);M(I)
645 IF PRTFLG THEN LPRINT USING "#### ####";P(I);M(I)
There are several things to note in this example. First the
program begins with line 500 and ends with line 1000, this is
required in order for the merge with MASTER or PAGE2 to work
correctly. Second the actual program terminates with a RETURN
statement, this is necessary to return control to the main menu.
Third the program uses the common variable SEQ$(), so that the use
of the function READ is prerequisite to the use of this function.
Finally note the use of the common variable PRTFLG to send output
to the printer only when that option has been selected.
In order to use this program you would have to enter the above
lines using the BASICA interpreter and save them using the command:
SAVE "PUPY",A. In addition you would have to make three changes
in the program PAGE2. These would also be made using the BASICA
interpreter as follows:
1. Load the program PAGE2 with the command LOAD "PAGE2"
2. Enable the fifth function key with the line:
50 KEY(5) ON:ON KEY(5) GOSUB 350
3. Change line 150 to read:
150 PRINT " F5 PUPY Finds runs of alternating Pu and Py"
4. Insert the word PUPY between the quotation marks in
5. Resave the program PAGE2 using the command:
I envision two possible difficulties with hardware
incompatibility. First these programs were developed with the IBM
monochrome display and contain various COLOR statements which will
probably have undesirable effects if a color monitor is used. I
apologize for this, and can only suggest that, if you have a color
monitor, you go through the programs (especially MASTER, PAGE2,
CONTROL and PRINTER) and change the COLOR statements to produce
more agreeable colors. Second there are a great many different
printers each with its owns special features and quirks. For the
most part these programs simply send text to the printer with a
LPRINT statement and should pose no problem. However, the program
PRINTER is designed to take advantage of the special features the
printer. It was written for the CITOH printer and will probably
not work with any other printer. If you have a different printer
I can only suggest that you go through this program and change the
various escape sequences. For example LPRINT CHR$(27)+"!" will
produce bold print on the CITOH printer but LPRINT CHR$(27)+"E" is
the proper statement to produce bold print on the IBM printer.