Fitting
We have created a plot of the median.
We would like to find planets in it. Finding planets amounts to selecting a transit curve that nicely fits our data. We our going to divide that task in the following sub-tasks.
- Generating a transit curve series
- Iterating over all transit curve parameters
- Scoring each candidate transit curve and selecting the best
Let us create a module for this as well. We will call it fit
.
Transit Curves
The above image shows a typical transit curve where the planet transits the host star multiple times. From this diagram we can learn about the parameters that make up such a transit.
Below we list the parameters important in our transit curve.
- Period. The time between the start of one transit and the start of the next transit.
- Base. Height of the line, when no planet transits. This is often normalized, but because of the choices we made, we need this parameter.
- Depth. How far the luminosity drops when the planets transits. This is related to the size of the planet.
- Duration. How long the luminosity stays at full depth.
- Decay. How much time it takes the luminosity to go from the
base
todepth
. In our model the attack, i.e. time it takes the luminosity to go fromdepth
back tobase
, and decay are the same. - Phase. Where in the period does the periodic function start.
Below you find a struct
and an implemented constructor that can track this
data.
# #![allow(unused_variables)] #fn main() { pub struct Transit { period: f64, base: f64, depth: f64, duration: f64, decay: f64, phase: f64, } impl Transit { fn new((period, base, depth, duration, decay, phase): (f64,f64,f64,f64,f64,f64)) -> Transit { Transit { period, base, depth, duration, decay, phase } } } #}
Notice that the new
function accepts a tuple of floating point numbers. We
will use this when we iterate over the parameters.
What we also want to know is, when we have got a time
, what is the
corresponding value of this transit curve. For that we are going to implement a
value
method on `Transit
What we also want to know is, when we have got a time, what is the corresponding value of this transit curve. For that we are going to implement a
value method on Transit
.
The interesting times are
- Before the decay. The value should be
base
- During the decay. The value should linearly interpolate between
base
andbase
-depth
. - During full transit. The value should be
base
-depth
- During the attack. The value should linearly interpolate between
base
-depth
andbase
. - After the transit. The value should be
base
.
Implement the above logic into a value
method for Transit
.
Iterate Parameters
Our transit curve has five parameters. We would like to generate candidate transit curves and check how well they fit our data. This can be accomplished by iterating over the five parameters, and mapping them into a transit curve.
FloatIterator
We will first focus on an iterator for a single f64
. We want all floating
point numbers between a start
and finish
, increasing each new number with a
certain step
. We will create a struct
that keeps track of where we are.
# #![allow(unused_variables)] #fn main() { pub struct FloatIterator { start: f64, finish: f64, step: f64, current: u64, } #}
Implementing a new
constructor should set the current
to 0
and accept
start
, finish
and step as parameters.
Next we need to implement Iterator
for FloatIterator
. We must import
std::iter::Iterator
so that we can easily reference it in our code. In the
next
method of the Iterator
trait we need to decide if we need to return a
Some
or a None
. This depends on the our intended return value. I.e. if the
value start + step * current
is less then or equal to our finish
.
# #![allow(unused_variables)] #fn main() { impl Iterator for FloatIterator { type Item = f64; fn next(&mut self) -> Option<Self::Item> { let value = self.start + self.step * (self.current as f64); if value <= self.finish { self.current += 1; Some(value) } else { None } } } #}
This wraps up our FloatIterator
.
Exemplar TupleIterator
Next we are going to create a TupleIterator
. It is going to show all the
necessary tools to create the actual TupleIterator
we want, without getting
distracted by the tedious details.
Because we want to express multiple times a range of floating point numbers we
are interested in, we are going to create a struct
to keep track of start
,
finish
and step
.
# #![allow(unused_variables)] #fn main() { pub struct FloatRange { start: f64, finish: f64, step: f64, } #}
implementing a new
constructor for FloatRange
is nothing more than excepting
the correct parameters and passing them in the struct. Having a FloatRange
allows us to ask it for the value belonging to a certain index. Let's extend the
implementation of FloatRange
with an index
function. The actual
implementation looks very familiar.
# #![allow(unused_variables)] #fn main() { fn index(&self, index: u64) -> Option<f64> { let value = self.start + self.step * (index as f64); if value <= self.finish { Some(value) } else { None } } #}
Now that we can express the floating point range we are interested in, we can
use that in our TupleIterator
. The responsibility of the TupleIterator
is to
keep track of two indices into two separate FloatIterator
. Because we need to
be able to "restart" the FloatIterator
we are not actually use a
FloatIterator
. Instead we choose to do the iterating our selves.
We start with a structure that will keep track for us.
# #![allow(unused_variables)] #fn main() { pub struct TupleIterator { first: FloatRange, second: FloatRange, current: (u64, u64), } #}
It looks a lot like the FloatIterator
. The main difference is that we need to
keep track of two different ranges, and two different indices into these
iterators. Implementing a new is just like the FloatIterator
little more than
accepting the correct parameters and initializing the current indices to
(0,0)
.
Now for implementing Iterator
for TupleIterator
. It comes down to keeping
track of the right indices. Let's look at the implementation.
# #![allow(unused_variables)] #fn main() { impl Iterator for TupleIterator { type Item = (f64, f64); fn next(&mut self) -> Option<Self::Item> { let (first_index, second_index) = self.current; match self.first.index(first_index) { Some(first_value) => { match self.second.index(second_index) { Some(second_value) => { self.current = (first_index, second_index + 1); Some((first_value, second_value)) }, None => { self.current = (first_index + 1, 0); self.next() }, } }, None => None, } } } #}
Reading the code, we discover that we determine two values, one for each
FloatRange
. If the first FloatRange
return None
we are done. If it returns
some value, we see what the second FloatRange
returns. If that also returns
some value, we do the following things.
- Increment the second index.
- Return the tuple of values.
If the second FloatRange
doesn't return some value, we know that it exhausted
its range. We then increment the first index, resetting the second index. In
effect we want to the second iterator from scratch, but with a new first value.
Next we use the power of recursion to determine what the value should be with
the new indices.
Although it doesn't look pretty, it does its job. It is your task to extend this example to iterate over all transit parameters.
Score
We are going to score a Transit
with the
least squares method. This
method sums the squares of the difference between two series. That is easier
said than done. Lets look at following code.
# #![allow(unused_variables)] #fn main() { pub fn least_squares(xs: &Vec<f64>, ys: &Vec<f64>) -> f64 { xs.iter().zip(ys) .map(|(a,b)| (a-b).powi(2)) .sum() } #}
We define a function names least_squares
that accepts to vectors of floating
points numbers. Next we recognize our dear friend: the iter
method. We use it
on the first vector and zip it with the second vector. On the vector of pairs we
map the function that calculates the squared difference. We finish with summing
all the numbers, getting our result.
With all the parts in place we are ready to start processing.
Processing
We need to compare candidate transit curves with our median, so we need to read our median.csv
. Because we would like to process the times and the values separedly we use the unzip
trick we learned earlier.
# #![allow(unused_variables)] #fn main() { let (times, values): (Vec<f64>, Vec<f64>) = raw .iter() .cloned() .unzip(); #}
Processing consist for a big part of a main loop that iterates over our transit parameters. This is depend on a number of FloatRange
s and this is where you can shine. By looking at your median.csv
data you can guess good ranges, and with some luck you will find planets.
We need to keep track of the best transit curve. So we initialize variables before our main loop.
# #![allow(unused_variables)] #fn main() { let mut best_score = f64::MAX; let mut best_transit: Option<Transit> = None; #}
In order to make use of the f64::MAX
we need to import std::f64
. Not that the best score will actually be the lowest value, so it is save to initialize it to the maximum value.
Inside our loop, we can create a transit curve from the parameters.
# #![allow(unused_variables)] #fn main() { let transit = Transit::new(parameters); #}
The transit
can be used to determine the values at the times we observed by mapping over the times
and using the value
method.
# #![allow(unused_variables)] #fn main() { let transit_values: Vec<f64> = times.iter().map(|t| transit.value(t)).collect(); #}
Scoring is little more than calling the right function.
# #![allow(unused_variables)] #fn main() { let score = least_squares(&transit_values, &values); #}
Now that we have the score, we need to compare it with the best score we now about, and update our best candidate accordingly.
# #![allow(unused_variables)] #fn main() { if score < best_score { best_score = score; best_transit = Some(transit.clone()); } #}
When the loop finishes we would like to know which transit is the best. So we prepare to print it to console, and calculate the actual values, which you can write to a CSV file.
# #![allow(unused_variables)] #fn main() { let best_transit = best_transit.unwrap(); let best_transit_values: Vec<f64> = times.iter().map(|t| best_transit.value(t)).collect(); println!("{:?}", best_transit); let result = times.iter().zip(best_transit_values); #}
Lets fly through our candidates and see what planet you can find.
Further Considerations
How do you know that you implemented the various libraries correctly? Have you tested them?
The way that we generate our transit parameters is going through all possible values. This seems a bit wastefull. Can you come up with a better way? Discussing your thoughts with somebody.
We used the method of least squares to score a transit curve. What other scoring mechanism can you think of. What difference would it make?