Detrend
Take a look the brightness graph you made in the preceding chapter.
Notice how the graph tends to flare up. This is a systemic problem that we should correct. We are going to do that by finding what trend the graph is following, and adjusting for that.
Processing
Before, we processed each row individually. Now we need to operate on the entire sequence. So instead iterating over each row, we are going to transform it directly.
Because a SimpleCsvReader
is an Iterator
we can use our tricks on it. The
idiosyncrasies of the SimpleCsvReader
make that we first need to unwrap a row.
Next we can map over the row of data and collect into a vector of tuples, the
entry being the time and the second entry being the brightness.
# #![allow(unused_variables)] #fn main() { let raw: Vec<(f64, f64)> = reader .map(|r| r.unwrap()) .map(data) .collect(); #}
The function data
has the following signature
# #![allow(unused_variables)] #fn main() { fn data(row: Vec<String>) -> (f64, f64) #}
data
is responsible for turning the raw columns of our CSV into f64
brightness values,
and selecting the correct ones.
Up until now we never returned more than two or three values. For our current
plan we are going to return more. In order to keep track of our data, we are
going to create a struct
.
# #![allow(unused_variables)] #fn main() { struct DetrendData { time: f64, brightness: f64, trend: f64, difference: f64, } #}
We have created a few entries, some familiar, some unfamiliar. time
and
brightness
are pretty self-explanatory. difference
is intended to hold the
difference between the brightness
and the trend
.
But how do we calculate the trend?
Strategies
Let us reflect on what we are trying to achieve. We have some data points \(y_{0}, y_{1}, \ldots, y_{n}\). We have a model that predicts that these values fluctuate around a given mean \(Y\), but for some reason or another, it doesn't.
Instead the values fluctuate around some function \(f\), for which we don't now the shape or form. This is called the trend.
Our goal is to approximate the trend function \(f\) by a function that we can calculate from the data. Next we can analyze the actual signal by removing the trend. In effect we will look at the de-trended signal \(y_{0} - t(0), y_{1} - t(1), \ldots, y_{n} - t(n)\). Here \(t\) is our approximation for the trend.
We shall do this by providing the values of our approximation.
There are numerous strategies for determining the trend in a sequence of data. Below you can find a strategy we have selected for this workshop.
Weighted Trend
With the notations from the preceding section the weighted trend algorithm is as follows. First you pick a parameter \(\alpha\) such that it lies between zero and one, i.e. \(0 \le \alpha \le 1\).
Next we will explain how to calculate each point of our approximation to the trend.
- \(t_{0} = y_{0}\). I.e. our first approximation is the first value of our sequence of data.
- \(t_{i} = \alpha y_{i} + (1-\alpha) t_{i-1}\). I.e. our trend tends towards the new value of our sequence, but is a but reluctant. It tends to stick to the previous values.
Let's implement this algorithm. With our DetrendData
structure, we have an
opportunity to directly implement the different branches of our algorithm. We
start an impl
block for DetrendData
.
# #![allow(unused_variables)] #fn main() { impl DetrendData { } #}
Next we are going to translate the first branch of the algorithm. Since our data
gets delivered to us in the form of a (f64, f64)
pair, we better accept it as
an argument.
# #![allow(unused_variables)] #fn main() { fn initial((time, brightness): (f64, f64)) -> DetrendData { DetrendData { time: time, brightness: brightness, trend: brightness, difference: 0f64, } } #}
It is little more than putting things in the right place. Next we will use the
current DetrendData
that we have, and use it to determine what the next
DetrendData
should be. Because this depends on the new data and the parameter
\(\alpha\), we better accept them both.
# #![allow(unused_variables)] #fn main() { fn next(&self, (time, brightness): (f64, f64), alpha: f64) -> DetrendData { let trend = alpha * brightness + (1f64 - alpha) * self.trend; DetrendData { time: time, brightness: brightness, trend: trend, difference: brightness - trend, } } #}
We calculate the trend
as described in the algorithm, and calculate the
difference from the brightness and the freshly calculated trend. With a
convenience method that turns the DetrendData
into a Vec<String>
we are
ready to calculate our entire trend.
We will collect our data in a vector of DetrendData
. Because we are going to
incrementally add new entries to it, it needs to be mutable.
# #![allow(unused_variables)] #fn main() { let mut sequence: Vec<DetrendData> = vec!(); #}
We also keep track of the last calculated DetrendData
in a mutable variable
called data
. Because we haven't calculated any value yet, its type is
Option<DetrendData>
.
# #![allow(unused_variables)] #fn main() { let mut data: Option<DetrendData> = None #}
This has the added benefit that we can differentiate between when to initialize data, and when to calculate the next data, during our iteration over our raw data.
# #![allow(unused_variables)] #fn main() { for candidate in raw { match data { None => { data = Some(DetrendData::initial(candidate)) } Some(previous) => { let next = previous.next(candidate, alpha); sequence.push(previous); data = Some(next) } } } #}
Further Considerations
How does the weighted detrend behave for known functions? Try to plot an step function, i.e. a series that starts out 0 and than is 1 through out, and detrend it.
What other kind of detrend strategies can you come up with?