


% -*- texinfo -*-
% @deftypefn {Function File} {} speed (@var{f}, @var{init}, @var{max_n}, @var{f2}, @var{tol})
% @deftypefnx {Function File} {[@var{order}, @var{n}, @var{T_f}, @var{T_f2}] =} speed (@dots{})
%
% Determine the execution time of an expression for various @var{n}.
% The @var{n} are log-spaced from 1 to @var{max_n}. For each @var{n},
% an initialization expression is computed to create whatever data
% are needed for the test. If a second expression is given, the
% execution times of the two expressions will be compared. Called
% without output arguments the results are presented graphically.
%
% @table @code
% @item @var{f}
% The expression to evaluate.
%
% @item @var{max_n}
% The maximum test length to run. Default value is 100. Alternatively,
% use @code{[min_n,max_n]} or for complete control, @code{[n1,n2,@dots{},nk]}.
%
% @item @var{init}
% Initialization expression for function argument values. Use @var{k}
% for the test number and @var{n} for the size of the test. This should
% compute values for all variables listed in args. Note that init will
% be evaluated first for @math{k = 0}, so things which are constant throughout
% the test can be computed then. The default value is @code{@var{x} =
% randn (@var{n}, 1);}.
%
% @item @var{f2}
% An alternative expression to evaluate, so the speed of the two
% can be compared. Default is @code{[]}.
%
% @item @var{tol}
% If @var{tol} is @code{Inf}, then no comparison will be made between the
% results of expression @var{f} and expression @var{f2}. Otherwise,
% expression @var{f} should produce a value @var{v} and expression @var{f2}
% should produce a value @var{v2}, and these shall be compared using
% @code{assert(@var{v},@var{v2},@var{tol})}. If @var{tol} is positive,
% the tolerance is assumed to be absolute. If @var{tol} is negative,
% the tolerance is assumed to be relative. The default is @code{eps}.
%
% @item @var{order}
% The time complexity of the expression @code{O(a n^p)}. This
% is a structure with fields @code{a} and @code{p}.
%
% @item @var{n}
% The values @var{n} for which the expression was calculated and
% the execution time was greater than zero.
%
% @item @var{T_f}
% The nonzero execution times recorded for the expression @var{f} in seconds.
%
% @item @var{T_f2}
% The nonzero execution times recorded for the expression @var{f2} in seconds.
% If it is needed, the mean time ratio is just @code{mean(T_f./T_f2)}.
%
% @end table
%
% The slope of the execution time graph shows the approximate
% power of the asymptotic running time @code{O(n^p)}. This
% power is plotted for the region over which it is approximated
% (the latter half of the graph). The estimated power is not
% very accurate, but should be sufficient to determine the
% general order of your algorithm. It should indicate if for
% example your implementation is unexpectedly @code{O(n^2)}
% rather than @code{O(n)} because it extends a vector each
% time through the loop rather than preallocating one which is
% big enough. For example, in the current version of Octave,
% the following is not the expected @code{O(n)}:
%
% @example
% speed ('for i = 1:n, y@{i@} = x(i); end', '', [1000,10000])
% @end example
%
% but it is if you preallocate the cell array @code{y}:
%
% @example
% @group
% speed ('for i = 1:n, y@{i@} = x(i); end', ...
% 'x = rand (n, 1); y = cell (size (x));', [1000, 10000])
% @end group
% @end example
%
% An attempt is made to approximate the cost of the individual
% operations, but it is wildly inaccurate. You can improve the
% stability somewhat by doing more work for each @code{n}. For
% example:
%
% @example
% speed ('airy(x)', 'x = rand (n, 10)', [10000, 100000])
% @end example
%
% When comparing a new and original expression, the line on the
% speedup ratio graph should be larger than 1 if the new expression
% is faster. Better algorithms have a shallow slope. Generally,
% vectorizing an algorithm will not change the slope of the execution
% time graph, but it will shift it relative to the original. For
% example:
%
% @example
% @group
% speed ('v = sum (x)', '', [10000, 100000], ...
% 'v = 0; for i = 1:length (x), v += x(i); end')
% @end group
% @end example
%
% A more complex example, if you had an original version of @code{xcorr}
% using for loops and another version using an FFT, you could compare the
% run speed for various lags as follows, or for a fixed lag with varying
% vector lengths as follows:
%
% @example
% @group
% speed ('v = xcorr (x, n)', 'x = rand (128, 1);', 100,
% 'v2 = xcorr_orig (x, n)', -100*eps)
% speed ('v = xcorr (x, 15)', 'x = rand (20+n, 1);', 100,
% 'v2 = xcorr_orig (x, n)', -100*eps)
% @end group
% @end example
%
% Assuming one of the two versions is in @var{xcorr_orig}, this
% would compare their speed and their output values. Note that the
% FFT version is not exact, so we specify an acceptable tolerance on
% the comparison @code{100*eps}, and the errors should be computed
% relatively, as @code{abs((@var{x} - @var{y})./@var{y})} rather than
% absolutely as @code{abs(@var{x} - @var{y})}.
%
% Type @code{example('speed')} to see some real examples. Note for
% obscure reasons, you can't run examples 1 and 2 directly using
% @code{demo('speed')}. Instead use, @code{eval(example('speed',1))}
% and @code{eval(example('speed',2))}.
% @end deftypefn