Use avrdude with -c butterfly for flash and eeprom, -c avr910 for the rest. Some devices (8535) need to leave reset state and resync after memory erase. Therefore use avrdude in two steps: first erase using -D, program in a second step using -e to inhibit auto erase.
The application avrdebug can be used to adjust voltage levels.
Some things have been tweaked especially for usb<->serial converters, but offer an enhancement with standard serial ports, too (see below). One can read a range of addresses in one command and it is possible to send a set of commands which are not answered before the whole block of commands has been received. Using such a usb<->serial converter, exchanging a little number of big packets is faster than a high number of small packets (half duplex mode).
Stateless communication is not used because the hardware is not able to receive/send on its (hardware) uart and doing a software uart or spi at the same time.
For example, my prototype has a buffer of 128 bytes to receive commands from the host (PC). I tested writing 512 bytes on the target, which takes 512 single commands. Baudrate is only 19200 baud. The is no real communication to a target (uDebugMode==DEBUG_SELF). The host PC runs linux 2.6.8. I measured the complete runtime of the program, which includes some additional communication not included in the calculation below (syncing to adapter, reading status information).
commands: 512*6 Bytes (plus 25*6 Bytes when using command blocks)
replies: 512*2 Bytes
4096 Byte -> 2.13 seconds at 19200 baud
4246 Byte -> 2.21 seconds at 19200 baud
using a usb-serial converter:
time with command blocks : 2.8 s
time without command blocks : 8.6 s
using standard serial port:
time with command blocks : 2.6 s
time without command blocks : 4.0 s