# Mike's Page

### Informatics, Development, Cycling, Data, Travel ...

Model of the partial space elevator

You are probably familiar with the idea of a space elevator; a rope extending from the Earth’s surface to beyond geostationary (with a counterweight attached). This has the amazing property that one could just ‘climb’ the rope. The counterweight pulls the rope back on station even. The kinetic energy gained by the payload comes at the cost of slowing the Earth’s rotation slightly. Brilliant. The problem with this is that to make the rope, one needs unobtainable materials. Huge amounts of carbon nanotubes or something.

There’s bound to be good reasons the following suggestion wouldn’t work. But I’m curious what they are. Rather than start on the Earth’s surface, what if our elevator starts at 2000km above the surface? This will allow us to build the rope out of much more reasonable materials. Why? The original rope needs to be strong as there’s a lot of it being pulled towards the Earth (and more being pulled the other way by the counter-weight). To hold this stuff up requires a lot of material, which is heavy, which means we need even more material, etc. Also the force of gravity is stronger closer to the Earth!

“How’s it stay up?” You might reasonably ask. This elevator, unlike the last is in ‘proper orbit’, or at least it is, on average. The part that hangs towards the Earth will be suborbital (indeed will be going quite slowly relative to low-earth orbit).

“But how do we get to the start of the tether if it’s 2000km up?” Going up into space is easy, getting into orbit is the expensive bit. An (awfully named) rockoon might be a neat way to get to the 2000km mark with a very modest rocket (the rocket equation means that we can use a very small rocket to get to 2000km, compared to the rocket required to get to 2000km orbit).

“But won’t you pull the whole thing down as you climb it?” Yes. To correct for this, ion-engines will be arranged on the tether for station-keeping. Some of the payload can be used for refuelling these (they have a specific impulse 10-100 times better than a rocket launcher so hopefully we’ll need less fuel!)

Some rough calculations

This one only is as high as geostationary, it is actually going faster than geostationary orbit (at that, and all altitudes).

Here I don’t allow for any extra forces or weakness, and assume we have available the full 5.8GPa strength of Zylon. Hopefully newer materials that are appearing that combine nanotubes and polymers will allow this assumption! Next I assume we can get to 2000km above the Earth with a rockoon. If I’ve applied the rocket equation correctly I think we’d need 2kg of fuel for each kg of payload etc (and some of that payload has to be propellant for the ion-engines on the tether). Still not bad though. If our materials improve we could get the start closer to the Earth. Anyway; this tether will be about 1cm wide at its widest point, and weigh about 2100 tonnes (the new Falcon Heavy can lift 26.7tonnes to GTO), if 2/3rd of that payload is tether, we’d need 118 flights (or less as we could start using the tether for the bottom part!).

### Update

Asked the question on stackexchange. Still can’t find any papers etc on this particular idea. In response to one of the questions on SE I ran a few more simulations – interestingly unstable, but I wonder if this instability can be mitigated by adjusting the length of the tether!

Christmas Squirrel

This year I made my mum a Christmas Squirrel!

What it does:

Give it one of the small Christmas things or birds and it will play a Christmas song or a radio station (4, classic or 3).

It uses an RFID reader, an audio amp and some other gubbins. It uses a raspberry pi and uses internet radio to get the audio streams.

The git repo is here: https://github.com/lionfish0/christmas_squirrel.

The volume control is a large circular button on its back.

I’ll try to get some photos when I next see it!

DASK is a library (for python) which lets you distribute computation over a cluster. DASK_EC2 is another module (closely related) which allows you to use AWS EC2 framework for creating the cluster etc. Just a quick note: DASK is good if your problem is embarrassingly parallel. Examples I’ve come across regularly, include:

• Cross-validation
• Fitting multiple datasets (e.g. separate patients)
• Parameter-grid search

I’ve found that DASK_ec2 isn’t being maintained at the moment, so I’ve made a repo with some of the changes I’ve needed here. The changes I’ve incorporated:

1. Allowing the use of spot-instances (see https://github.com/dask/dask-ec2/pull/66)
2. Fixed a bug to allow the distributed computers to use 16.04 (see https://github.com/dask/dask-ec2/issues/98)

## How to install

### Get AWS setup

From https://boto3.readthedocs.io/en/latest/guide/quickstart.html (Boto is the Amazon Web Services (AWS) SDK for Python)

sudo apt-get install awscli pip install boto3

Visit AWS -> IAM -> Add user -> Security Credentials -> Create Access Key. Run aws configure and enter the ID, code, region. Notes, I use for region ‘eu-west-1’, outputformat is blank (leave as JSON).

### Test

Try this python code and see if it works.

import boto3
s3 = boto3.resource('s3')
for b in s3.buckets.all():
print(b.name)


From http://distributed.readthedocs.io/en/latest/ec2.html, it says to install dask-ec2 with pip install dask-ec2 (don’t do this!!!) instead now get from my repo with the above changes incorporated:

pip install git+https://github.com/lionfish0/dask-ec2.git

### Sort out keys

Visit AWS->EC2->Key pairs->Create key pair. I called mine “research”. Save the keyfile in .ssh, chmod 600.

### Select AMI (instance image we want to use)

Get the AMI we want to use (e.g. ubuntu 16.04). Check https://cloud-images.ubuntu.com/locator/ec2/ and search for e.g. 16.04 LTS eu-west-1 ebs.

Edit: It needs to be an hvm, ebs instance. So I searched for: “eu-west-1 16.04 ebs hvm”.

### To start up your cluster on EC2

We can start up the cluster with dask-ec2 but it wants some parameters, including the keyname and keypair. I found I had to also specify the region-name, the ami and tags as the first two have wrong defaults and the tool seems to fail if tags isn’t set either. Also found using ubuntu 16.04 had a SSL wrong version number error which is fixed hopefully if you use my version of the dask-ec2 repo (see https://github.com/dask/dask-ec2/issues/38 ). count specifies the number of on-demand instances (has to be at least 1 at the moment). spot-count is the number of spot instances (combine with the spot-price, which I set to the price of the on-demand instances). The volume-size is the size in Gb of the instance hard disk, and the type is the ec2 instance type. The nprocs is the number of calculations the computer will be given to work with I think. As GPy does a good job at distributing over multiple cores, I just give each instance 2 problems at a time.

dask-ec2 up --keyname research --keypair .ssh/research.pem --region-name eu-west-1 --ami ami-c8b51fb1 --tags research:dp --count 1 --spot-count 5 --spot-price 0.796 --volume-size 10 --type c4.4xlarge --nprocs 2

Eventually after a long time, this will finish with:

Dask.Distributed Installation succeeded
---------
Web Interface:    http://54.246.253.159:8787/status
TCP Interface:           54.246.253.159:8786

To connect from the cluster
---------------------------
dask-ec2 ssh  # ssh into head node
ipython  # start ipython shell

from dask.distributed import Client, progress
c = Client('127.0.0.1:8786')  # Connect to scheduler running on the head node

To connect locally
------------------
Note: this requires you to have identical environments on your local machine and cluster.

ipython  # start ipython shell

from dask.distributed import Client, progress
e = Client('54.246.253.159:8786')  # Connect to scheduler running on the head node

To destroy
----------

Installing Jupyter notebook on the head node
DEBUG: Running command sudo -S bash -c 'cp -rf /tmp/.__tmp_copy /srv/pillar/jupyter.sls' on '54.246.253.159'
DEBUG: Running command sudo -S bash -c 'rm -rf /tmp/.__tmp_copy' on '54.246.253.159'
+---------+----------------------+-----------------+
| Node ID | # Successful actions | # Failed action |
+=========+======================+=================+
| node-0  | 17                   | 0               |
+---------+----------------------+-----------------+
Jupyter notebook available at http://54.246.253.159:8888/


### Install libraries on cluster

Importantly the remote cluster’s environments have to match the local environment (the version of linux, the modules, the python version, etc all have to match). This is a bit awkward. Finding modules is a problem…I found these not to work out the box. Critically, it failed with “distributed.utils - ERROR - No module named dask_searchcv.methods“. I found I had to intstall the module on each worker:

Either by hand:

local$dask-ec2 ssh 1 dask1$ conda install dask-searchcv -c conda-forge -y

Or better is to write a python function to do this for us – I run this every time I startup a new cluster, to install all the stuff I know I need.

def install_libraries_on_workers(url):
"""Install libraries if necessary on workers etc.

e.g. if already on server...
install_libraries_on_workers('127.0.0.1:8786')
"""
from dask.distributed import Client
client = Client(url)

runlist = ['pip install -U pip','sudo apt install libgl1-mesa-glx -y','conda update scipy -y','pip install git+https://github.com/sods/paramz.git','pip install git+https://github.com/SheffieldML/GPy.git','pip install git+https://github.com/lionfish0/dp4gp.git','conda install dask-searchcv -c conda-forge -y', 'pip install git+https://github.com/lionfish0/dask_dp4gp.git', 'pip install numpy', 'conda remove argcomplete -y']#, 'conda install python=3.6 -y']

for item in runlist:
print("Installing '%s' on workers..." % item)
client.run(os.system,item)
print("Installing '%s' on scheduler..." % item)
client.run_on_scheduler(os.system,item)
#os.system(item) #if you need to install it locally too


## Example

Here’s a toy example to demonstrate how to use DASK with GPy

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import GPy
from dask import compute, delayed
from dask.distributed import Client

#adding the delayed line means this won't run immediately when called.
@delayed(pure=True)
def predict(X,Y,Xtest):
m = GPy.models.GPRegression(X,Y)
m.optimize()
predmean, predvar = m.predict(Xtest)
return predmean[0,0]
#return np.mean(Y)

values = [np.NaN]*1000
for i in range(1000):
X = np.arange(0,100)[:,None]
Y = np.sin(X)+np.random.randn(X.shape[0],1)+X
Xtest = X[-1:,:]+1
values[i] = predict(X,Y,Xtest) #this doesn't run straight away!

client = Client(ip+':8786')

#here is when we actually run the stuff, on the cloud.
results = compute(*values, get=client.get)

print(results)


On two 16-core computers on AWS, I found this sped up by 59% (130s down to 53s).

More examples etc is available at http://dask.pydata.org/en/latest/use-cases.html

### Update

If you did this a while ago, dask and things can get out of date on your local machine. It’s a pain trying to keep it all in sync. One handy command;

conda install -c conda-forge distributed

E.g.

mike@atlas:~\$ conda install -c conda-forge distributed Fetching package metadata ............. Solving package specifications: .

 Package plan for installation in environment /home/mike/anaconda3: The following packages will be UPDATED: dask: 0.15.4-py36h31fc154_0 --> 0.16.1-py_0 conda-forge dask-core: 0.15.4-py36h7045e13_0 --> 0.16.1-py_0 conda-forge distributed: 1.19.1-py36h25f3894_0 --> 1.20.2-py36_0 conda-forge The following packages will be SUPERSEDED by a higher-priority channel: conda-env: 2.6.0-h36134e3_1 --> 2.6.0-0 conda-forge Proceed ([y]/n)? y 

dask-core-0.16 100% |################################| Time: 0:00:01 269.93 kB/s distributed-1. 100% |################################| Time: 0:00:01 597.96 kB/s dask-0.16.1-py 100% |################################| Time: 0:00:00 1.16 MB/s

Note for self:

Just type

debug

in the next cell. (u = up, c = continue)

Use:

import pdb; pdb.set_trace()

to add a trace point to your code.

A linear model with GPy.

Background: I’m working on a project aiming to extrapolate dialysis patient results over a 100-hour window (I’ll write future blog posts on this!). I’m working with James Fotheringham (Consultant Nephrologist) who brilliantly bridges the gap between clinic and research – allowing ivory-tower researchers (me) to get our expertise applied in the real world to useful applications.

Part of this project is the prediction of the patient’s weight. We’ll consider a simple model.

When a patient has haemodialysis, they will have fluid removed. If this doesn’t happen frequently or successfully enough the patient can experience *fluid overload*. Each dialysis session (in this simplified model) brings the patient’s weight back down to their dry-weight. Then this weight will increase (roughly linearly) as time goes by until their next dialysis session. I model their weight $$w(t,s)$$ with a slow-moving function of time-since-start-of-trial, $$f(t)$$, added to a linear function of time-since-last-dialysis-session, $$g(s)$$:

$$w(t,s) = f(t) + g(s)$$

For now we’ll ignore f, and just consider g. This is a simple linear function $$g(s) = s*w$$. The gradient $$w$$ describes how much their weight increases for each day it’s been since dialysis.

We could estimate this from previous data from that patient (e.g. by fitting a Gaussian Process model to the data). But if the patient is new, we might want to provide a prior distribution on this parameter. We could get this by considering what gradient other similar patients have (e.g. those with the same age, vintage, gender, weight, comorbidity, etc might have a similar gradient).

Side note: Colleagues have recommended that we combine all the patients into one large coregionalised model. This has several problems: excessive computational demands, privacy (having to share the data), interpretability (the gradient might be a useful feature), etc.

Other side note: I plan on fitting another model, of gradients wrt patient demographics etc to make predictions for the priors of the patients.

So our model is: $$g(s) = \phi(s)w$$ where we have a prior on $$w \sim N(\mu_p, \sigma_p^2)$$.

If we find the mean and covariance of $$g(s)$$:

$$E[g(s)] = \phi(s) E[w] = \phi(s) \mu_p$$
$$E[g(s)g(s’)] = \phi(s) E[ww^\top] \phi(s’) = \phi(s) (\mu_p \mu_p^\top + \Sigma_p) \phi(s’)$$

It seems a simple enough problem – but it does require than the prior mean is no longer zero (or flat wrt the inputs). I’m not sure how to do that in GPy.

Update: There is no way of doing this in GPy by default. So I solved this by adding a single point, with a crafted noise, to produce or ‘induce’ the prior we want.

I’ve written this into a simple python function (sorry it’s not on pip, but it seems a little too specialised to pollute python module namespace with). Download from <a href=”https://github.com/lionfish0/experimentation/blob/master/linear_kernel_prior.py”>github</a>. The docstring should explain how to use it (with an example).

My talk at DSA2017 and the website (with realtime predictions of Kampala’s air quality)

By default it seems GPy doesn’t subtract the mean of the data prior to fitting.

So it’s worth including a mean function that does this:

 m = GPy.models.GPRegression(
X,
y,
mean_function=GPy.mappings.Constant(
input_dim=1,output_dim=1,value=np.mean(y))
)

Then one needs to fix this value:

m.mean_function.C.fix()

A few notes from my visit to the city:

Tuesday: Arrived. A brief period of moderate panic on the plane when I thought I wouldn’t be let in without an electronic visa. But as of July 2017 people can still buy a single entry visa on entry. Had dinner down at Club 5. I think maybe it’s not as good as I remember!

From left to right: Ssekanjako John, the bodaboda driver; me; Engineer Bainomugisha.

Wednesday: Engineer and Joel took me on a tour ’round Kampala to visit the sites where they’ve got air pollution monitors up. We first met the bodaboda driver who’s hosting one of the sensors on his motorbike. He’s had a bit of hassle from security asking what the box is, but he’s disguised it by painting it black and half hiding it under a shredded old bin bag!

Sensor on Jinja Road

The sensor on Jinja Road looks like it’ll be measuring quite a bit – it was surrounded by traffic regularly pumping out black smoke. I suspect that, of the pollution from vehicle emissions, the majority will be from a small proportion of vehicles…

A more sobering part of the tour was to the large dump, north of the Northern Bypass. There we saw hundreds of people (some with huts built in the dump itself) sorting through the rubbish looking for recyclables. I didn’t see much evidence of PPE.

Kampala’s main dump

The main source of particulate pollution here will probably be the dirt tracks but I suspect it will be quite low (there’s very little rubbish burning apparently, when we asked around). More concerning are gas and volatile organics. I imagine ground water is contaminated too.

Thursday: Block B was shut today as the government had rented it (I wonder who got the cash??!) to do interviews for parliamentary positions. Awkward as the lab with our equipment is in there. I got to hear a few presentations at the AI Lab though, and it was good to catch up with everyone.

I took a brief bit of time from working to visit the art gallery on campus. If anyone’s visiting Kampala and has a spare half-hour, I’d recommend it!

Friday: We got a monitor working on block B outside the lab’s window. It’s having trouble with its powersupply, so it’s somewhat erratic at the moment. I got the website up and running.

For old-times sake I went down to Mediterraneo for dinner. It still seems to be going strong, and has a nice vibe in the evening.

Next: Arusha!

A Marabou Stork (Image from wikimedia)

Back at Makerere working on the air pollution monitoring project with Engineer Bainomugisha.

One of my favourite things at Makerere is sitting at a table outside the guest house, with a cup of “African Spiced Tea”, watching the Marabou storks.

We’ve just held the July GPy hack day. Key outcome: we’re going to be building the documentation in a brand new user-friendly way, based on sk-learn’s style, and using Tania’s new system for turning a bunch of notebooks into a website. Other notes from the meet. More on this soon…

© 2018 Mike's Page

Theme by Anders NorenUp ↑