OpenEdge performance issues with Windows 2003 Server SP3 with Broadcom NICs

June 6, 2008 · Filed Under Performance Tuning, Reliability · Comment 

Progress KBase entry P128141 may be of note to Progress users on Windows 2003 Server if you are running into intermittent performance problems.

Windows 2003 SP3 deploys with the incorrect version of a Broadcom NIC driver causing potentially severe performance problems. As many IBM blade centers ship with Broadcom NICs, and newer ones particularly with the NetExtreme II this kbase entry is worthy of review especially if your planning on going to SP3 of Windows 2003 Server or have and are noticing any performance degradation post patch.

Additionally the SP3 release of Windows Server 2003 releases an entirely new network protocol architecture which attempts to off load the TCP protocol stack (among others) onto the NIC. We are reviewing the effect of this on performance, particularly in regards to short lived TPC connections (WebSpeed, AppServer, HTTP, etc).

OpenEdge BUFFER-COPY from INT64 to INT Causes Index Corruption

March 24, 2008 · Filed Under Reliability · Comment 

A kbase entry P129814 regarding code that can cause database index corruption affects all V10 versions and service packs. The index corruption occurs when an INT64 field is buffer-copied to an INT field. Our testing shows that this also corrupts temp-table indexes and only manifests when the INT64 value is greater than MAX_INTEGER (about 2.1 billion).

What makes this more concerning is that the index corruption will go undetected until the record is accessed as the code below shows. In addition with shops migrating from 9 to 10 it is likely that INT64 and INT fields will co-exist for some time in the database and temp-table as new code and fields are added in the normal evolutionary process.

We recommend caution when writing code that mixes INT64 with INT fields.

Some simple code to manifest the bug:

/*
BUFFER-COPY from INT64 to INT Causes Index Corruption */
DEFINE TEMP-TABLE ttTest NO-UNDO FIELD iIdxNorm AS INTEGER
   FORMAT “z,zzz,zzz,zz9″ INDEX idxPrime IS PRIMARY UNIQUE iIdxNorm ASC.
DEFINE TEMP-TABLE ttTest2 NO-UNDO
FIELD iIdxNorm AS INT64 INDEX idxPrime IS PRIMARY UNIQUE iIdxNorm ASC.
/* The value of the copy from int64 must be greater than MAX_INTEGER (32 bit)
(about 2.147 billion) to cause the error. */
DO cnt = 1 TO 2:
    CREATE ttTest2.
    ASSIGN iIdxNorm = 2400000000 + cnt.
END.
FOR EACH ttTest2:
    /* The first ttTest2 record creates a new ttTest. The 2nd ttTest2
    record is copied over the ttTest which modifies the existing ttTest
    index record and corrupts it. */
    BUFFER-COPY ttTest2 TO ttTest.
END.
/* No errors thrown to this point, the creates appear to have succeeded */
/* Attempting to access the ttTest index throws the error */
FOR EACH ttTest NO-LOCK:
    DISPLAY ttTest.iIdxNorm.
END.

OpenEdge ABL Phantom Error

November 30, 2007 · Filed Under Development, Reliability · Comment 

At Solvepoint we’ve coined a new term, the Phantom Error. What is a Phantom Error you ask? Well, it is an undesirable circumstance where the Progress VM decides to raise the error condition but leaves error-status:get-message() set to the empty string.

This , of course, can lead to a host of ugly sorts of bugs not the least of which is not having any idea where the error happened or why.

Here is a simple example that demonstrates a Phantom Error:

DEFINE NEW GLOBAL SHARED VARIABLE myHandle AS HANDLE NO-UNDO.
main: DO ON ERROR UNDO main, RETRY main:
  IF RETRY THEN do:
    MESSAGE RETURN-VALUE error-status:get-message(1) VIEW-AS ALERT-BOX.
    LEAVE main.
  END.
  RUN someProc IN myHandle.
END.

You’ll notice that both return-value and get-message(1) return blank.

If you happen to be in a terminal based procedure editor, however, you will receive the error message in the “message area” at the bottom. Unfortunately, this doesn’t do much good for server code.

As a consequence, server code will typically swallow these Phantom Errors at worst, or report the error out of context at best.

So, what can be done? Diligent error trapping is called for. Protect the code by always testing handles before using them.

Unfortunately, this isn’t the only type of code that will produce a Phantom Error. We will discuss other Phantom Errors under the Tag “Phantom Errors” in other posts.

OpenEdge ABL Memptr Pitfalls

November 30, 2007 · Filed Under Development, Reliability · Comment 

Memptr is a very powerful datatype in the ABL/4GL. It allows the programmer to store any type of data including binary. However, as with all dynamic objects in Progress, one must be careful when using it.

Pitfall number one: Scope. Memptrs do not follow the rules of scope to which 4GL programmers have become accustomed.

What does this mean? Why do I care? Well, I’ll tell you. Since Memptrs do not follow the rules of scope, when a variable holding a Memptr HANDLE goes out of scope the associated memory is NOT released. You care because if this happens you now have a memory leak. Every time your program is executed it will leak memory equal to the amount allocated to your memptr.

Pitfall number two is the allocation process. Probably 99% of code you find will do this:

/* define a memptr variable */
def var m as memptr no-undo.
/* Allocate the memory */
set-size(m) = 1024.
/* now go ahead and start using it… */

See anything wrong with the above? If not, don’t blame yourself. You are used to Progress doing this for you, but in this case, it does not. What am I referring to? You C programmers will know! The memory that has been allocated to m has not yet been “initialized”. This means the memory will contain random data: whatever happened to be in there before the allocation. If you are using put-string before your first get-string (without the numbytes parameter) then you have nothing to worry about since put-string automatically puts a NULL (0) as the next byte after the string and get-string will only read up to that NULL. But for other operations like put/get-byte or put/get-bytes or put/get-string with the numbytes parameter, grabbing random uninitialized data out of memory could bite you, so beware.

The final pitfall is also related to allocation. In your code, you may define a memptr at the beginning of a procedure and then use it in several places throughout the code. In each use you will want to allocate the appropriate amount of memory. So you may code something like this:

def var m as memptr no-undo.
set-size(m) = 128.
/* do stuff with it here… */
set-size(m) = 1024.
/* do other stuff with it here … */
and so on…
/* now we clean up and return */
set-size(m) = 0.
return.

This looks great right? We are allocating and cleaning up just as we should, right? Well, yes and no. The pitfall is that the second set-size where we allocate 1024 bytes doesn’t actually allocate anything. It essentially does nothing at all. AND it does not raise an error condition. So now we have a potential bug if the code attempts to put more than 128 bytes into that memptr.

This is solved by setting the memptr to 0 first.

Moral of the story, memptrs need special care as they do not particpate in conventional scoping and cannot be resized until they are cleared.

Hope this helps to save you some time in your coding efforts!

OpenEdge Memory Management Anti-pattern

November 30, 2007 · Filed Under Development, Performance Tuning, Reliability · Comment 

Hello all,
I’ve recently been reviewing some Progress 4GL and have found an all too common anti-pattern related to memory management.

When a variable is defined, the Progress runtime client (Virtual Machine) allocates memory at runtime for that variable. Once the variable is out of scope, the memory is released and everyone is happy. Progress programmers have grown comfortable with this design and obliviously define variables whenever they are needed knowing that they will be de-allocated automagically by the Progress VM.

Then came dynamic objects.

Progress programmers were overjoyed! They could now create temp-tables, buttons, queries all on the fly at runtime. No more convoluted if-then statements or .i’s or having to code a different “for each” for every combination of where clause.

However, as with any power bestowing feature, there is a dark side to this wonderful new world of dynamic 4gl: memory management. Most programmers never really stopped to consider the fact that if something is created dynamically at runtime, the VM has no way of knowing the scope. It cannot tell when to release the memory required for the dynamic object. REMEMBER: the scope of the variable you happen to assign the object to HAS NO BEARING on the scope of the OBJECT since it can be passed around. In other words, the scope of the variable holding the HANDLE to the object is NOT bound to the OBJECT itself. The scope of ALL OBJECTS are always at the SESSION. This applies to GUI widgets, dynamic queries, temp-tables, etc.

Java (and other VM’s) solve this through the use of a separate execution thread running concurrently called a Garbage Collector. Its job is to scan memory and find dynamic objects that are no longer “reachable” and release their memory. Unfortunately, the Progress VM has no such thread/concept.

To add insult to injury, not only does this leak memory but it also causes progressively worse performance: The more widgets in memory, the more time it takes to create another widget. Here are three examples (run on a 2.1ghz processor):

Button Handles

Create 1000 Button Handles

Minimum memory required/lost per Button Handle: 512 bytes

Query Handles

Create 1000 Query Handles

Minimum memory required/lost per Query Handle: 1024 bytes

Temp-Table Handles

Create 1000 Temp-Table Handles

Min. memory required/lost per Temp-Table Handle: 512 bytes

So, as you can see, from both a memory and CPU footprint standpoints, it is very important to be sure to clean up your objects.

This may seem like a large number of handles, but remember two important points:
1. This is at the session level. This means that if a.p calls b.p which creates objects then those will exist for the life of the session: THERE IS NO SCOPE other than SESSION FOR DYNAMIC OBJECTS and they are NEVER automatically reclaimed!
2. If the programs are running as part of a long-running session such as AppServer, EagleIQ server or Webspeed, then you have to consider the cumulative affect over days, weeks or months.
Also note that if it is a temp-table, it could potentially have a much larger memory footprint.

So, what must be done?
It is up to the Progress programmer to clean up each and every dynamic object created using the “delete object” command.
It may be appropriate to create a widget-pool in which to assign your objects so you can just delete the pool and all the objects within will be released as well. In fact, if you create a non-persistent widget-pool, it will be automatically deleted when it goes out of scope. Creating the object into a non-persistent pool will make it behave as if it were scoped at the level that the widget-pool is created: in effect, making it behave as if it were statically defined.

If you don’t use a non-persistent widget-pool, then It is also important to be sure the “clean up” code is executed even when there is an error. For example, the following will bleed memory if an error condition is raised within the blah blah:

      procedure doQuery:
          def var qh as handle.
          create query qh.
          /* so some business logic here */
          do while true:
               blah blah
          end.
          delete object qh.
     end.

However, if you create a non-persistent widget pool, then it is automatically deleted when it goes out of scope. So the following will not leak memory even if an error condition happens:

procedure doQuery:
    def var qh as handle.
    create widget-pool “wp”.
    create query qh in widget-pool “wp”.
    /* so some business logic here */
    do while true:
         blah blah
    end.
    delete object qh.
end.

The widget-pool may be defined at the .p level as well. In this case the pool is deleted when the .p is exited.

Oh, by the way, persistent procedures and memptr’s are two other constructs that have a session level scope. However, they cannot be part of a widget-pool and therefore must be handled individually.

10.1B Changes Integer Math Wrapping Behavior

November 22, 2007 · Filed Under Development, Reliability · Comment 

An encryption component built into one of Solvepoint’s products came to our attention in a regression test on 10.1B. The root cause was a change in integer math wrapping behavior. Notice, I said “change”. I did not say “improvement”.

Some background, first. In most major computing languages integers wrap when an operation overflows the maximum allowed integer.

  • In C# int i = int.MaxValue + 1; wraps.
  • In ANSI C unsigned integers wrap.
  • In Java integers wrap.

Integer wrapping behavior is an embedded, predictable and necessary part of many applications including a number of encryption algorithms, communications stacks, and data verification algorithms.

What about Progress applications? Well,…

All versions of Progress prior to 10.1B wrapped integers.

With the coming of 64-bit integers someone at Progress realized that it was possible to do 32-bit arithmetic in 64-bits and actually know whether the result went beyond the maximum (or minimum) integer. But just because something could be done doesn’t mean it should be done. So even though the Decimal data-type exists as an easy alternative to those wanting to avoid classic integer behavior, Progress changed integer math in 10.1B to throw an error instead of wrapping. If you’re incredulous, see Solution ID P119716.

In 10.1B 32-bit integer wraps throw an error “Value integer too large to fit in INTEGER datatype. (13682)“. Be prepared for some code written long ago to break.

If we multiply an integer less than the maximum 32-bit integer by a multiplicand that will result in a value greater than the maximum 32-bit integer, we can see the issue played out. In the following example we will use 1234567891 (a large easy to remember number less than max 32-bit int, +2,147,483,647) and multiply it by 10. In Pre 10.1B the result is -539222978, but as of 10.1B it returns an error.

Difference between 32 bit signed integers in OpenEdge 10.1B

Looking at the binary underneath this, see where 10.1B detects the overflow within 64-bits and throws the 32-bit exception #13682:

How OpenEdge 10.1B detects overflow

But can we now declare all Progress integer math safe? Not really. Since 64-bit integers in 10.1B are as blind to the >64 bits as 32-bit integers were to the >32 bits before 10.1B, 64-bit integers in 10.1B wrap around the way 32-bit integers did before 10.1B. (read that sentence several time if need be ) To help you understand why, Progress says “Checking for 64bit overflow would be too expensive. To do it would require putting everything in 128 bits to do calculations and then to copy it all back, since >64bits are required to do 64bit overflow checking. This would cause all arithmetic in the 4gl to grind to a halt. There are therefore no plans to do anything about this.”

So you’ve been warned. When this bites you’ll be able to say, “Darn!, and I read something about that somewhere!”

PS: And yes, for all the hard-core C jocks, signed integer wrapping is implementation dependent, not guaranteed to demonstrate modulus behavior, but usually wraps. Use of a signed integers rather than unsigned integers when classic integer wrapping is needed in C is a no-no.

OpenEdge ASSIGN Statement - More than just performance.

November 22, 2007 · Filed Under Development, Performance Tuning, Reliability, Trivia · Comment 

Back in the early ’90s when I broke the news of the ASSIGN statement in a Profiles in Progress article, I had no idea what silliness would follow. I also had no idea that twenty years later we would be seeing newly written code that still gets it wrong.

For the few who are as yet unaware, wrapping consecutive assignments with an ASSIGN has multiple benefits.

a = 4.
b = 3.
c = 7.

isn’t as good as

ASSIGN a = 4
       b = 3
       c = 7. 

Shortly after the Profiles in Progress article was published there were signs that some in the Progress community were getting it wrong. Even though the article clearly graphed variations in execution time, some performance pundits started running around saying the ASSIGN statement was 2.7163639281 times faster than not using the ASSIGN. I’m exaggerating the precision of the number, of course, to make a point. None of those digits (including the initial “2″) are significant. None.

The ASSIGN statement varies for many reasons, not the least of which are whether what’s being assigned are components of an index in common, or part of a key that fully qualify a unique index. Let’s demonstrate one of these two factors using the example above.

Let’s suppose that a, b, and c are components of an index a-b-c. And let’s start, for simplicity’s sake, with a, b and c all having the value of “1″. In the ASSIGN-less code snippet, the index a-b-c will move through two transitional values, 4-1-1 and 4-3-1, before it gets to 4-3-7. Using ASSIGN, index a-b-c becomes 4-3-7 directly. Not using ASSIGN causes three-fold the activity:

ASSIGN DB Activity

Further, if index a-b-c is unique and the transitional values 4-1-1 or 4-3-1 already exist for another row, then the code without the ASSIGN statement will fail. Yes, fail.

Here’s a diagram that illustrates this index key collision for the above example:

ASSIGN Index Collision

The ASSIGN statement is not just an optional performance improvement as some believe. In some contexts, likely those least considered, lack of ASSIGN will affect reliability. Reliability is more important than Performance. Performance is also impacted as, in the example above, the application database must now perform triple the number of index lookups, inserts, and removes. Not good.

So everyone is using the ASSIGN statement, right? Sadly, no. One would think that we should find proper use of the ASSIGN statement it in all code written since the 80’s. Sadly, this isn’t the case –even in newly written 2007 code. We’ve seen it with our own eyes. As a community of Progress users, I know we can do better.

While this article isn’t an exhaustive presentation of the reasons why using ASSIGN is better, I’m hoping that the reliability and performance example above is compelling enough. It should be.

Hope this helps.

lkrela - What OpenEdge 10.1B users need to know and why it matters.

November 21, 2007 · Filed Under Administration, Reliability · Comment 

Progress users have come to expect extreme reliability from the Progress DB. This is no accident and has taken much consistent work over many years by the Progress Engine Crew. It is amazing how the Progress DB keeps ticking given what’s thrown at it in the field.

If you aren’t already aware, you should know that something bad has slipped through Progress’ regression tests.

If you are a 10.1B user, especially if you aren’t yet on Service Pack 3, your transactions may be exposed to data corruption. Tom Harris, OpenEdge RDBMS Development, calls it “a bad bug”. In an e-mail to the PEG DBA Forum, Tom has asked for users to “please” use the lkrela startup parameter “if you are running 10.1B prior to service pack 3″.

On OpenEdge versions 10.1B through Service Pack 2 inclusive there are critical bugs in locking that affect how locks are released, which in turn affect transaction consistency.

For your reference, the link to Solution ID is P126982.

While there is a fix in Service Pack 3, until Progress updates its regression tests for this and possibly related conditions, we are recommending to our clients that they keep using the lkrela startup parameter on 10.1B.