Timing your PyTorch Code Fragments

1 min readMay 20, 2019

Wondering how long cuda operations take in your PyTorch-based training code? For instance, how much time does the feed-forward path take? Likewise, how much time does the the back propagation path take? You would think this would be as simple as wrapping the call (loss.backward(), in the case of back propagation) with a timer variable that records the start of time and then follow that up with recording the time right after the call. Not so.

The Wrong Way

# Below timing method will NOT work for asynchronous cuda calls
import time as timerstart = timer.time()
loss.backward()
print("Time taken", timer.time() - start)  # tiny value

As highlighted above, cuda operations are asynchronous, they return immediately. You’re not really measuring anything substantive in the code.

The Right Way

The right way would be use a combination of torch.cuda.Event() , a synchronization marker and torch.cuda.synchronize() , a directive for waiting for the event to complete.

start = torch.cuda.Event(enable_timing=True)
end = torch.cuda.Event(enable_timing=True)

start.record()# whatever you are timing goes hereend.record()

# Waits for everything to finish running
torch.cuda.synchronize()

print(start.elapsed_time(end))  # milliseconds

As documented, torch.cuda.synchronize()”waits until the completion of all work currently captured in this event. This prevents the CPU thread from proceeding until the event completes”.

Sample Code

A working sample is available here. As a matter of practicality, I’ve decided to throwaway the first few measurements as they are “warmup” measurements.

Timing your PyTorch Code Fragments

The Wrong Way

The Right Way

Sample Code

Written by Auro Tripathy

No responses yet