Monday, July 28, 2008

Crashes Accessing Valid Memory (Due to Improper Packing and Structs)

Before reading this article, it would be helpful to become familiar with data alignment issues, and how packing can affect structures with regards to Windows CE.

Recently, I had to write some code that read in a bitmap from a file. For various reasons, I was not using any preexisting libraries however I was using the standard structures as defined by Microsoft.

The first issue that I noticed was that my application (which seemed fine) was crashing.

My algorithm was essentially:
  1. Open File (via CreateFile)
  2. Get Filesize of Bitmap
  3. Allocate enough Memory for Bitmap
  4. Read Entire File into Memory
  5. BITMAPFILEHEADER* pHeader = file In Memory;
  6. BITMAPINFOHEADER* pInfo = offset to Info;
  7. BYTE* pImageData = offset to Image Data;
The crash was occurring in step 6 when I was accessing the width field within the BITMAPINFOHEADER struct. The weird thing was after checking the pointer value I discovered that I was accessing a 32-bit value that was only aligned up to 16-bits (it wasn't 32-bit aligned.) Right away I knew that this wasn't right, unless of course the struct was packed (and the OS wasn't packing right.)

I quickly checked the initial alignment of my memory buffer and of the BITMAPFILEHEADER and found both to be valid. After this I checked the header definitions of the struct BITMAPFILEHEADER which was already defined within pragma pack(2) keywords as it was only 16-bit aligned. However, the struct itself was 14 bytes in size. This left the BITMAPINFOHEADER improperly aligned as it needs to be 32-bit aligned (but was 16-bit aligned instead.)

Because the BITMAPINFOHEADER itself is properly aligned the headers for Windows CE do not specify a packing. This behavior is in fact correct, as packing the headers will cause the OS to load data into and out of the struct much slower in situations where the struct is properly aligned.

To solve this problem, we really have a two options:
  1. Create 2 Buffers and call ReadFile twice. One call into ReadFile reads in the initial buffer which is packed to 2-bytes. The other call reads in the BITMAPINFOHEADER and remainng data which is packed to 4-bytes. This results in two calls to ReadFile (instead of 1) but causes BITMAPINFOHEADER to be properly aligned. Accessing the 32-bit data within the BITMAPINFOHEADER will be faster, however we have to do multiple System Calls into the OS.
  2. Redefine (Internally not in the PUBLIC\... code) BITMAPINFOHEADER to be say BITMAPINFOHEADER_PACKED and pack it to 2-bytes to account for the 16-bit alignment caused by the 14-byte size of BITMAPFILEHEADER.
This problem exists in far more than just with Bitmaps as it can actually exists with any struct. However, the example with the bitmap shows how subtle the problem may actually be. It also highlights why it may be a good idea to keep your structs alignment and size packed to 4 or 8 bytes from the start.

Wednesday, July 9, 2008

Steps to Creating a Windows CE FAL/FMD Driver

This guide is here to help new developers develop a NAND Driver for their platform using the FAL/FMD Driver. Before choosing to write a NAND driver following the FAL/FMD driver model, the developer should first choose whether the MDD/PDD model or FAL/FMD model is more appropriate for their platform and to meet their goals.

Before starting on the driver it would be a good idea to first read and understand the FAL/FMD Model via MSDN Documentation on the FAL/FMD

The FAL/FMD model is fairly straight forward, you basically need to provide functions to the Windows FAL to Erase a Block, Write a Sector, Read a Sector, Get Block Status and Set Block Status.

When dealing with Block and Sector Numbers with your flash, you should treat all numbers from the FAL to your FMD as Logical Block or Sector Numbers. This will allow you to easily remap and move flash blocks around.

MDD/PDD Model Wrapper for FMD Driver
There is a PDD Wrapper for the MDD/PDD Model that basically creates a PDD that simply calls into an FMD driver. Although this will work fine for existing NAND Flash Drivers, I do not recommend using this method to create a MDD/PDD Model if you do not already have a working and stable FMD Driver.

NAND Chip and Platform Configuration
Before any work on the driver is started, it is important to understand what the NAND Chip in use expects and how it interfaces with the Platform.

All NAND chips have a command based interface where a Command is issued across the data bus, followed by optional Address data and finally actual data. Additional signals which may be needed are Chip Enable, Address Latch Enable and Command Latch Enable. You will also need to determine if your NAND is interleaved or not.

To determine how you interface with your NAND you will need to refer to the Processor Manual, NAND Chip Datasheet and Schematic (or Hardware Engineer responsible for the platform's hardware design.)

Interleaved NAND
My opinion on interleaving NAND is that unless you absolutely need the additional speed or performance that it provides that it adds too much complexity to be worthwhile in the majority of situations.

Looking at Other NAND Drivers
If you are unclear as to how the FAL/FMD system works (after reading through MSDN.) there is a functional NAND driver used by the H4SAMPLE BSP. This is located at: H4SAMPLE\SRC\DRIVERS\NANDFLASH.

Determine Type of ECC to Use
There are a few options that are available when it comes to adding ECC support to NAND. The number of bits of Error Detection / Correction required will generally determine what options are available.

The first option is to use a Hardware ECC Controller, although this is the fastest solution, it requires hardware support to operate. Many (but likely not all) SOC processors that have dedicated support for NAND also have NAND ECC Controllers. Additonally, some hardware solutions only provide support for Single-Bit Correction.

The second option is to use the provided Microsoft ECC Library. This library only supports 512-byte pages and generates a 6-byte ECC code that is capable of 1-bit correction, 2-bit detection. This can still be used on devices with > 512-byte pages, but those pages will have to be broken up into multiple 512-byte sections. Doing so may result in too much ECC data being generated (overflowing the free space in the Spare or OOB section.) so care should be taken before going this route.

The final option is to purchase a third-party ECC Library or create your own.

Both interleaving and MLC NAND causes restrictions to be placed on the type of ECC Algorithm that may be used (due to it's requirement for > 1-bit Correction, 2-bit Detection) so it is important to understand your platforms ECC requirements prior to choosing any particular solution.


This article will hopefully explain many of the important differences between MLC and SLC NAND and how using each may affect a platform.

SLC (Single-Level Cell)
Single-Level-Cell NAND is a type of NAND where each cell (an electrical unit containing a charge) is able to represent two states (either a one or a zero.) In this type of NAND, if a single-cell became corrupted, only a single-bit would change state.

For example, the 8-bit byte 00100110 would require 8 Cells of an SLC NAND chip to store the byte. If a single cell became invalid then only one of the bits would flip from a 0 to a 1 or a 1 to a 0.

MLC (Multi-Level Cell)
Multi-Level Cell NAND is a type of NAND where each cell (an electrical unit containing a charge) is able to represent more than two states (for example 4 states). In this type of NAND if a single cell became corrupted, more than a single bit would change state.

For example (4 State MLC NAND), the 8-bit byte 00100110 would require 4 cells of MLC NAND to store the byte. If a single Cell became invalid then one(1) - two(2) of the bits could flip.

NAND Flash itself, is known to suffer from the possibility of having a cell within a Page become corrupt (change to a state other than the correct state.) Because of this ECC is used to determine if any bits have flipped, and if they have correct them (as long as too many haven't flipped.)

Historically, 1-bit Error Correction, 2-bit Error Detection was always used for the ECC Generation because SLC NAND only suffers from single-bit errors on cells that are within a good block (Bad Blocks can have more than one (1) bad cell per page.)

With the advent of MLC NAND, 1-bit Error Correction, 2-bit Error Detection is no longer sufficient to detect NAND errors as a single corrupt cell may actually cause 2 or more bits to flip depending on how many levels each cell could contain (4-state = 2 bits, 8-state = 3 bits, etc...)

Considerations When Deciding on SLC or MLC NAND
MLC NAND often costs less per MB and is available in larger sizes as SLC as it can be made more densely (due to multiple states per cell.) Because of this cost consideration, the use of MLC NAND has become more and more prevalent as our storage needs also increase.

Because using MLC NAND requires the use of an ECC Algorithm that can do better than Single-Bit Error Correction, Double-Bit Error Detection (SECDED) using MLC NAND may cause situations where Hardware ECC Controllers are not able to be used to offload the processor (many only support SECDED.)

Wednesday, July 2, 2008

Win32 Events

Many people tend to get hung up on the more subtle aspects of the Win32 events as they create designs that make certain incorrect assumptions on how Win32 events work.

There are really two different types of Win32 events, Auto-Reset and Manual-Reset events.

Auto-Reset Events
These events will automatically transition themselves back to a non-signaled state as soon a one thread that is waiting on the event is released.

Using SetEvent with an Auto-Reset Event will result in a single thread being unblocked prior to the event returning to the non-signaled state. If no threads are currently waiting on the event the event the next thread to wait on the event will be unblocked (unless of course ResetEvent or PulseEvent are called prior to a thread waiting on the event.)

Using PulseEvent with an Auto-Reset Event will result in a single thread waiting on the event to become unblocked prior to the event returning to the non-signaled state. If no thread are currently waiting on the event, the event is still returned to the non-signaled state.

Manual Reset Events
These events will remain in their Signaled state until either PulseEvent or ResetEvent are called on their events.

If SetEvent is called on a Manual Reset Event, the event will remain signaled until ResetEvent or PulseEvent is called. This will result in all threads waiting on the Manual Reset Event (along with future threads while it is still signaled) to become unblocked.

If PulseEvent is called on a Manual Reset Event, the event is set to a signaled state, all threads currently waiting on the event are unblocked and the event then returns to a non-signaled state.

Win32 Events and Thread Priorities
Some people make the assumption that the highest priority thread that is waiting on a Win32 event will be the thread that gets released. This should never be assumed as there are no guarantees that the highest priority thread will be the thread that is released.

Win32 Events with Multiple Waiting Threads
The only way to ensure that all threads that are waiting on a Win32 Event are to use a Manual Reset event teamed with calls to PulseEvent or SetEvent. Using an Auto-Reset event when you have multiple waiting threads that should be released together will always result in not all threads being released.

Related Documentation
MSDN: CreateEvent
MSDN: SetEvent
MSDN: ResetEvent
MSDN: PulseEvent

Tuesday, July 1, 2008

OAL ISR or Installable ISR

Sometimes, when someone needs to handle the interrupt for their device driver in an ISR (because using an IST has too much latency or because the interrupt is shared) they will decide to use the OAL's ISR instead of creating an Installable ISR for their driver. In general, using the OAL ISR for your ISR is a bad idea, as it unnecessarily couples the OAL and your particular hardware platform.

The following are examples of Interrupts that normally will be handled in the OAL ISR:
  1. System Timer for the SYSINTR_RESCHED.
  2. Interrupt for on-chip RTC Driver (as it's driver resides in the OAL.)
  3. ISR for Profiler.

Other than the three(3) instances above, we should never really be putting ISR for device drivers in the OALs ISR routine. In fact, if you look at the 3 items above, they all are for components or modules that exist within the OAL itself, not at the Windows CE Kernel or User-Level driver level.

Note: There are likely more instances than the three(3) listed above, but the instances above should be the most common instances of drivers that should have their ISRs included in the OAL.

For drivers such as Ethernet MACs, Serial Ports and LCD Controllers, they should either be using an Installable ISR or simply just an IST. They should never be putting themselves in as an OAL ISR.

Pros of OAL ISR over Installable ISR:
  1. ISR gets called slightly faster than Installable ISR.
Cons of OAL ISR over Installable ISR:
  1. Driver becomes non-portable. OAL ISR needs to be modified for EVERY platform that the driver must run on.
  2. OAL becomes non-portable. OAL becomes tied to hardware platform making it desirable to have one OAL for each hardware platform, or forces the developer to use ifdef's to select hardware platform against OAL.
  3. Harder to upgrade driver without replacing entire OS.