TERSE Tip of the Week

I often see .com programs where the first instruction in the program is a jump "over" the data. It is desirable to place the data first so both the programmer and the assembler have the benefit of knowing the types of the variables before they are referenced. This prevents the assembler from "guessing" the size of the operands. When it has to guess, it needs to guess big, resulting in Nop's being inserted in your code to fill in the space that was allocated but not needed. There is a very simple, but rarely used, assembler directive that can be used to alleviate this problem. The directive I'm referring to is the Group directive. This directive allows you to combine several segments into a single segment. As you know, .com programs must be a single segment (at least when they "start"). By using the Group directive you can have multiple segments at assembly time that are then combined to form a single segment at link time. This allows you to have a code segment and a data segment that will be combined at link time into a single segment. In the Group directive you specify the order the linker should use when combining the segments. In your program you can open the segments and close them as often as you wish, in any order, and the linker will gather (or group) them into a single segment. Let's take a look at a simple example:

main Group code,data; \ code & data become 1 seg, code first. Assume cs:main,ds:main,ss:main;\ tell assembler what's in the seg regs. code Segment byte; \ open code segment. Org 100h; \ all .COM programs start at 100h data Segment byte; \ open data segment. ' msg ="Hello World$"; \ declare the data. data EndS; \ close data segment. Start: \ program starts here. dx = Offset main:msg; \ dx = offset of msg relative to main. ah = 9; !21h; \ output the message. !20h; \ terminate program. code EndS; \ close code segment. End Start; \ program begins at start.

This TERSE program is the classic 21 byte "Hello World." Notice that the data is before the code and there is no extra jump to get around it. Here is the assembly for this program as generated by the TERSE compiler:

main Group code,data ; code & data become 1 seg, code first. Assume cs:main,ds:main,ss:main ; tell assembler what's in the seg regs. code Segment byte ; open code segment. Org 100h ; all .COM programs start at 100h data Segment byte ; open data segment. msg db "Hello World$" ; declare the data. data EndS ; close data segment. Start: ; program starts here. Mov dx,Offset main:msg ; dx = offset of msg relative to main. Mov ah,9 Int 21h ; output the message. Int 20h ; terminate program. code EndS ; close code segment. End Start ; program begins at start.

If you don't take the offset of msg relative to the group main, you will get the wrong offset. The assembler always returns the offset relative to the segment unless specified otherwise. So, without the main: the offset would have been 0000h (msg is the first thing in the data segment) instead of 0109h. To simplify this, I use a text Equ defined as:

O Equ <Offset main:>; \ define O as offset main.

This allows me to write:

dx = O(msg); \ dx = offset of msg.

This simplifies both the writing and the reading of the code. The parenthesis are not required, I use them to improve readability. They make the "O" operator look more like the function it is.

The TERSE Tip of the Week will always apply to standard assembly not just TERSE programming. This is the intent of this page: to help all low-level programmers no matter what language they use. So, come back next week for the next TERSE Tip of the Week.

* * * *

[TERSE Tid Bits]
[Return to Table of Contents]
[Order My Copy of TERSE]

* * * *

Copyright © Jim Neil. All Rights Reserved.
The word OPTOMIZED, the name TERSE, and the TERSE logo are Trademarks of Jim Neil.