Microoptimalization

Comparing A register to zero
CP 0 can be replaced by OR A. It has same effect on Z flag but different effect on other flags. Another solution is SUB A.

Comparing register to limit value 2^x
Code ld c, 0 loop {do something} inc c     ld a, c      cp 128 jr nz, loop

can be replaced by    ld c, 0 loop {do something} inc c    bit 7, c     jr nz, loop

Indirect jumps
Sometimes you have JP xxxx instruction in your code which is executed very often. JP xxxx takes 10T but there is another instruction JP (HL) which takes only 4 T. Of course, HL is very handy register and sometime it is not posible to waste it just for some speed tuning. But there are JP (IX) and JP (IY) instructions too. Booth takes 8 T.

Faked subroutine call
Sometimes you want to use subroutine which manipulates stack. Saving and restoring stack takes time. Code ld hl, stack1 call subroutine ld hl, stack2 call subroutine ...

subroutine ld (subroutine_sp + 1), sp     ld sp, hl      .... subroutine_sp ld sp, 0 ret

can be replaced by     ld sp, stack1 ld ix, return1 jp subroutine return1 ld sp, stack2 ld ix, return2 return2 ...

subroutine ...     jp (ix) If you carefuly place code in such way that return1 and return2 addresses have same MSB then it is possible to change just LSB of IX.

Conditionals jumps
Be aware that of conditionals jump timing. JP cc, xxxx takes always 10T but JR cc, xx takes 12T if condition is true or 7T if condition is not true.

Delayed jumps
Take advantage of the fact that not every instruction has effect on flags. For example or a     sbc hl, de      jr z, branch2

ld hl, data1 ld de, destination ...

branch2 ld hl, data2 ld de, destination ... can be written as     or a      sbc hl, de      ld de, destination jr z, branch2

ld hl, data1 ...

branch2 ld hl, data2 ...