Posted by John Haine on 31/10/2020 11:29:55:
Regression is a bit foreign to me too! I just use the software – apart from the strange syntax R Studio is very easy once you get used to it, and is used for heavyweight statistical analysis (such as epidemiological data…), not to mention free!
Supposedly CF road does absorb water from the air into the resin which affects the weight (not the length) but the amount is tiny and the effect usually insignificant compared to the bob – I estimated about 3 uS total for my pendulum IIRC. May be more if the rod weight and springiness is more significant. I have seen some analysis of the effect of humidity on Clock B (since it was measured) and as far as I recall it is very small:
**LINK**
That also has some useful information about "computer compensation".
Something I'm not clear about in my mind is just what correlation we should be looking for. Buoyancy and amplitude depend on density, but density is a function of both pressure and temperature. So should one normalise each pressure reading to absolute temperature to allow for this? There's also a small effect of temperature on viscosity which Harrison is said to have allowed for.
I'm not sure what's going on after looking at last night's run. I added "Haines Seconds" to the Arduino, which now calculates ticks as: 922805 + 4035.47xAmplitude -38.02xTemp +13.76xPressure -23.56xRelHumid and upgraded the stats program. Result of graphing Haines ticks rather than pendulum measured ticks:

Big reduction in stdev, hurrah, but the result over nearly 15hrs is 47 seconds out, which is worse than my simple correction based on averaging the arduino clock to align it with GPS seconds. I believe linear regression should do better than my method, so something's adrift. Below, uncompensated pendulum is 6.9 seconds out, and crudely recalibrating the Arduino clock brings the error over 15 hours down to under a second.

One difference is Haines seconds (from linear regression) have stronger correlations with temp, humidity and pressure when the stats are generated from them rather than pendulum measurements. I expected the regression formula to have that effect but note the correlation with pressure is more amplified. It may indicate the formula is over-compensating for pressure.

Not blaming the formula at all. It assumes the clock was set up as it was in the data file I sent, which may not be true.
Having swotted up on Linear Regression I reckon I can apply it with the software I'm using (Python Numpy & SciPy). Nothing against R except I'm not familiar with it and I'm fluent in Python. I'll be a happy man if I can get the same answer as you from the log.
Meanwhile preparing to move the clock from Arduino to Raspberry. Only necessary to get decent sub-second clock measurements, possible because the Raspberry has NTP. No need for a Raspberry once the clock has been analysed. Although Raspberries are much faster than Arduinos they don't support user-level interrupts. Not having interrupts makes coding a fast response to pendulum events harder, but I'm sure I can make it work. Transferring to Raspberry is almost a complete re-write.
Also trying to come up with a way of detecting vibration, which may be causing the time anomalies I see in the data. No luck so far, and – unlike Arduino – the Raspberry doesn't even have the basics!
Many thanks for the link to 'Clock B' – fascinating, and I clearly have a long way to go. Also been reading through your other thread, which shows I'm not a pioneer!
Cheers,
Dave