# How to make a linear model with moving average principle?

It takes the values and adds them up and divides by 10 then it goes to the next 10 years and adds them up and divides by 10 so it it moves year by year by year taking the years I mean we could spend a whole lecture on the subject of moving average but I would basically I would suggest that you go to Google or DuckDuckGo DuckDuckGo is my favorite web searcher and type in moving average also Excel has a as a moving average it has a moving average routine all right so.

I’ve typed in I’ve run a linear model and my linear model says that in order to predict a straight line through the data I would I get a an intercept of 61 and a y coefficient of minus point zero to seven and I’m gonna actually easiest thing to do the best thing to do is to put that into an object so I’m gonna run it again but this time the result is not going to give me a risk it’s not going to give me any output it’s just going to run that object into fit and if I type in fit I get the results same thing but if I type in summary I can summarize that it you get a longer description of what it is it tells me and this is where the important things the information is so it tells me this this is my command that.

I typed in on the script summary fit it says that here’s the formula that I used it predicts niall TS from the time in the data it gives me my residuals and I can so I can compare those residuals to the summary nyle my residual mite my the the value of Niall TS ranged from about four and a half to 13 and a half which is nine and my residuals range from about minus five to plus 4 so my prediction is not a it doesn’t do a great job of predicting the data the r-squared is only about 0.2 nevertheless the the estimate of the regression coefficient is statistically significant which is a interesting situation so we have a prediction model that works but is not doesn’t do a great job of predicting let’s go back to my plot let’s just put in the plot without thee without the actually let’s just let’s do this.

Let’s put the plot in for those of you having trouble moving around yes and lets you type equals just a point and let’s not worry about the axes let’s just do PCH equals 16 so there’s my data again on a different axis and let’s add in a regression line there the regression line regression line we’re almost out of time if we’re gonna can we are gonna keep ourselves to the hour we need to think about stopping so let’s give me some comments about that regression line what is its show I think the sort of high residuals or do it fits properly to the data or you see high r-squared I think it has a lower Square sir that’s the low residual a high residual errors.

I think and what is the direct what is the nature get some negative them negatively correlated its it suggests that over the period of a hundred years there has been a negative trend to the data although it’s not been very strong it’s significant it’s statistically significant but it’s not very strong okay in fact that it so it and in fact if you look at the if you look at the coefficient to fit if let’s just that’s how we can look at the coefficients themselves so if you look at that second coefficient minus point two seven and you multiply it by the length nyle TS what.

I’ve done is the coefficient says that the regression line predicts a decline of minus point zero two billion cubic meters per year if we multiply that yearly change by the number of years we get the predicted predicted decreased flow during that period so during that period of time the model predicted that the that the Nile River had decreased by two point seven billion cubic meters over the whole period so we multiplied the the regression coefficient by the number of years I mean the regression coefficient is the change per year we multiply it by the hundred years so obviously if we wanted to be silly we could say you know what’s it gonna happen in two hundred years well you multiply by two hundred it’s not it’s not silly but it’s a little bit.

I mean it’s sophisticated sophisticated idea but it’s kind of silly so we’d say well okay over the next hundred years if that kept going by 2071 the Nile would have decreased by minus five point four billion cubic meters obviously that’s a silly idea but you know I mean if I wanted to plot it I could do that in fact I could go a plot Nile dot es x lim equals c right three 370 and let’s give it by limb.

Let’s see 2-0 running out of space here there’s my data now I fit the line to it so that’s a sophisticated thing to do but it’s silly because who knows what’s going to happen in the next hundred years but that’s what the regression line says regression line says that by by 2071 the flow the Nile would have gone down to five billion cubic meters okay well we’re four minutes past eleven hundred Eastern Daylight Time let’s do that any questions all right so let’s see if we one more hour or half-hour will do in a half an hour so we’re gonna start we’re gonna stop at my time 11:30 which is 25 minutes about from now about 24 minutes from now so given that we’ve got some more time let me get some questions problems issues.

I really like to hear from people okay we can adjourn one more and I have I have a convention about linear regression on non continuous data I don’t I’m not sure what’s meant by non continuous data I mean that I mean you can run regressions as long as you have two sets of data an X variable and a Y variable you can run a regression in fact you can run a regression as long as you need a dependent variable and as many independent variables as you want or if you’re running you know principle component analysis I mean the whole question of linear models is you know that’s a whole.

I mean that’s a years of study I mean I I took at MIT I took two years of the econometrics with which so you can run linear you can run regression on not on an ordinal data on continuous data on it has to be a quantifiable in some way it has to be either turned into factors or something like that but but that’s a you know that’s a good question I mean it’s a it’s a very robust technique that can be used to do all kinds of things any other questions.

I mean I know there are other questions anybody have any problems with running are looking at our studio seeing my screen whatever I’m gonna make my pick my screen smaller and so I can see I’m gonna keep my eye on the chat window let’s let’s let’s do a couple of things let’s look at this regression model I it’s obviously it’s gets good and bad it’s good in the sense that there’s a statistically significant slope it’s bad in the sense that the r-squared is not that great let’s compare the residuals of the model to the data itself so let’s do ta r MF r o W equals C.

Then let’s do a histogram there’s a histogram with the data and let’s compare it to the residuals and it been would fit with the residuals look like first of all there they are so the residuals from that model are a list of numbers hundred numbers they’re the residuals they’re the difference between the actual values and the regression line and let’s look at a histogram of them and let’s make sure that we look use the same or n14 you why doing hissed fitrah CID just what error unexplained symbol excellent hist why am i it’s not showing an error message it’s fit Russian marker this room and I do owe residuals residuals have to be between mine let’s we need ok.

Let’s see that the difference between 4 and 14 is 10 so let’s do minus 5 and minus 5 I’m gonna show this same thing again great like right for their my residuals they’re both and the there’s a histogram of the data and the residuals they’re both on the same scale as you can look at you can see those that’s maybe uce x equals seee x equals 1.3 ok so you can see my author on the the two residuals are shown on the same scale in other words this the the first histogram shows this range from from 4 to 14 which is a range of 10 and the residuals are on a scale of minus 400 minus 5 to +5 committed.

New minus 5 to +5 I have approximately the same scale you can see that the units these are units of 2 from 8 to each bar is one unit and it’s the same on on both scales and you can see that the Instagram of the residuals which we’d like to be really small is not that much different from the from the from the range of the histogram itself all right so we’re going to do let’s do one more thing with the with the Niall data that’s your forecast and this you may what you have to do in order to run forecast is to add the library for the forecast library and you probably don’t have it so.

I perhaps should I hit return you okay so there is a forecast library and I can but what I what I suggest you do is you load the forecast library in order to do that you have to go up to I should know what I don’t do this much with that’s not file where is it I gotta go ID up make a note to myself to learn how to run I don’t I don’t I usually run our GUI libraries in our studio I don’t know how to load them in so there’s a forecast library and I’m gonna so you probably I’m getting ahead of you guys because you haven’t loaded that library and I can run a forecast on yes.

I get a result so what it does is says okay you want a forecast it and there’s a technique for running forecast so if I do congestion mark forecast and I hope screen picture generic there’s a whole bunch of help files there’s not too much in the way of examples but it’s a means of doing a forecast and I’m going to run it on the Nile data simply because it shows us that forecasting is not very useful here make a plot of that forecast let’s go back to well that’s a forecast of oh I can actually make it a freeze now you probably can’t see that so I should not do that.

I should just make it maximum and let’s make the lines bigger a3 okay so you what what forecast is done is generated a forecast into the future go back to the console it’s done it’s generated a forecast of the Nile flow from 1971 to night to 1980 and it’s given us a port forecast in other words it’s given us a specific scalar a prediction for the future and you can see that every one of those forecasts is the same number so is it so it’s it’s it’s saying is that the forecast model which is a based on econometric time series of Melisa ETS exponential smoothing.

It’s exponential smoothing so the the forecast which is based on the behavior of the data over the past hundred years says that every year will be the same but that the standard deviation of those predictions will increase from or the 95 the 95 percentile prediction will go from five point from five point eight down to five point six that’s the that’s the bottom and the top value will go from ten point nine up to eleven point two so the standard deviation of those forecasts increase as you look out in future and of course obviously we’re just touching the surface of a whole a whole bunch of of complexities all right so what I want to do in the last ten minutes.

I mean I’ve skipped let’s see did I do everything I wanted to do residuals think I just the other thing I wanted just to point out to you is that you can plot the data and lots of different well I don’t need to do that all right those let so let’s save that script sorry cancel I’m going to go back to the script window and I type I click file save choose encoding why is it asking me that you I’ll take utf-8 I guess all right so if I go if I go to if I exit the window I can open up that file and there’s the script again so I’ve saved the file on my desktop you can’t see my desktop but you can get an idea if I I can open up the file.

I can save it and so forth so let’s close that file and let me just use the come the console window to explore some other ideas just in the next few minutes exist – all right so if I question mark I can also type help you can also do data sets data says I should know how to do this but I don’t so there is a there is a dataset called l betts if you go to if you type in help L deaths or you type in question mark L this you get a helpfu help screen that tells us monthly lung deaths from lung diseases in the UK so just to let you know that that our can handle other kinds of data time series besides months and if I type up if I type L deaths you know an interesting matrix so the matrix shows me the number of I’ll just type this in monthly deaths from bronchitis.