How to measure performance in Julia
For that, we recall the vector summation example from the introduction to function and include the simple @time
macro.
function mysum(V)
s = zero(eltype(V))
for i in eachindex(V)
@inbounds s += V[i]
end
return s
end
V = rand(100_000)
@time mysum(V)
@time mysum(V)
0.006981 seconds (2.30 k allocations: 158.491 KiB, 97.52% compilation time)
0.000162 seconds (1 allocation: 16 bytes)
50052.74764520874
@inbounds
macro to eliminate inbound checks - does the index exist - for the array access.The downside with the @time
macro is, that it really just measures the execution time of what is given to it. This means, if the function is not already compiled this might include compiling or if the CPU is busy with something else it is often not accurate.
Therefore, if we are serious about measuring performance we should stick to the BenchmarkTools
. It comes with a couple of macros that we should test out:
using BenchmarkTools
, as any other package. Benchmark our mysum
function with the following macros:
@benchmark
@btime
Look at the detailed output of your benchmark with
dump(t)
, wheret
is the output result of a@benchmark
run.
and compare the output and results.
using BenchmarkTools
@benchmark mysum($V)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 133.601 μs … 188.302 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 133.702 μs ┊ GC (median): 0.00%
Time (mean ± σ): 134.038 μs ± 1.794 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█▆▁ ▁
████▇▇▅▄▃▅▅▄▁▃▄▄▇▇▇▆▆▅▆▅▄▁▁▃▁▃▃▁▁▃▁▃▃▁▁▁▁▁▃▁▆▇▇▇▇█▆▇▇▅▆▅▅▅▅▆▆ █
134 μs Histogram: log(frequency) by time 141 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
the full details with
t = @benchmark mysum($V)
dump(t)
BenchmarkTools.Trial
params: BenchmarkTools.Parameters
seconds: Float64 5.0
samples: Int64 10000
evals: Int64 1
overhead: Float64 0.0
gctrial: Bool true
gcsample: Bool false
time_tolerance: Float64 0.05
memory_tolerance: Float64 0.01
times: Array{Float64}((10000,)) [158502.0, 156602.0, 134101.0, 133601.0, 133702.0, 133701.0, 133602.0, 133701.0, 133701.0, 133701.0 … 133702.0, 133701.0, 133702.0, 133701.0, 133701.0, 133802.0, 133701.0, 133702.0, 133701.0, 133702.0]
gctimes: Array{Float64}((10000,)) [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
memory: Int64 0
allocs: Int64 0
and the often used sanity check, that actually also shows you the output of your code.
@btime mysum($V)
133.601 μs (0 allocations: 0 bytes)
50052.74764520874
$
literal for variables to tell the Julia interpreter to use interpolation. This will make sure that the variable is not allocated inside the function and the measurement is more accurate, or more likely what we actually want to know.We can also use the Profiler
package to really dig into profiling the code but this is a bit too much of a deep dive for this class, it would look like this:
using Profile
Profile.clear()
@profile for _ in 1:100_000; mysum(V); end
Profile.print(maxdepth=15)
Overhead ╎ [+additional indent] Count File:Line; Function
=========================================================
╎8281 @Base/client.jl:495; _start()
╎ 8281 @Base/client.jl:309; exec_options(opts::Base.JLOptions)
╎ 8281 @Base/client.jl:379; run_main_repl(interactive::Bool, quiet::Bool, banner::Bool, history_file::Bool, color_set::Bool)
╎ 8281 @Base/essentials.jl:714; invokelatest
╎ 8281 @Base/essentials.jl:716; #invokelatest#2
╎ 8281 @Base/client.jl:394; (::Base.var"#936#938"{Bool, Bool, Bool})(REPL::Module)
╎ ╎ 8281 ...r/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:351; run_repl(repl::REPL.AbstractREPL, consumer::Any)
╎ ╎ 8281 ...r/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:364; run_repl(repl::REPL.AbstractREPL, consumer::Any; backend_on_current_task::Bool)
╎ ╎ 8281 .../worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:231; start_repl_backend(backend::REPL.REPLBackend, consumer::Any)
╎ ╎ 8281 .../worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:246; repl_backend_loop(backend::REPL.REPLBackend)
╎ ╎ 8281 ...worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:150; eval_user_input(ast::Any, backend::REPL.REPLBackend)
╎ ╎ ╎ 8281 @Base/boot.jl:373; eval
╎ ╎ ╎ 8281 .../package_linux64/build/usr/share/julia/stdlib/v1.7/Profile/src/Profile.jl:28; top-level scope
1╎ ╎ ╎ 8281 REPL[18]:1; macro expansion
╎ ╎ ╎ 1 REPL[7]:3; mysum(V::Vector{Float64})
╎ ╎ ╎ 1 @Base/abstractarray.jl:279; eachindex
╎ ╎ ╎ 8279 REPL[7]:4; mysum(V::Vector{Float64})
╎ ╎ ╎ 8279 @Base/range.jl:837; iterate
Total snapshots: 16562