before post-processing whisper - lecture 7 apr4.mp4

Wow moo i org.
Wow moo i org.
Wow moo i org.
Wow moo i org.
Wow moo i org.
Wow moo i org.
Wow moo i org.
Tightest.
So the clock says 816 so i think it's time to start i didn't hear the bell ringing so it seems to happen sometimes not only when the break starts yeah so we stopped last week in.
We have already seen what this entails apart from the notation so one property is that moving average processes are always stationary.
That is one property which we have and then in the act and the pack so in the dependency structure we really have that complementary behavior in the sense that the act breaks off after lack a the order of the model this is the behavior other regressive processes show in the pack and then the pack of a moving average model has an exponential decay in the magnitude of the.
1.
Here in particular for the ma1 it is invertible if the coefficient is smaller than one in absolute value so this is the condition that we have for an ma1 but there is a general condition which holds for any may process and that is that the roots of the characteristic polynomial these can be complex numbers again.
For practical work we don't really require that.
1.
For the a model back to one to back to a and then finally the innovation variance again this is under the implicit assumption that the innovation term is gaussian which is the typical assumption it's not per be written in the definition of an may but when we talk about estimation so it's always implicit assumption that the innovation is gaussian.
So parameter estimation so it starts with a slide that is for the first time so it's a very simple method and the one which is usually applied by using starting values which come from ass as we will see and model diagnostic so that's pretty much similar so it's kind of seen or can be seen as a repetition to what we discussed in the other regressive models.
So they define a linear equation system which is kind of easy to solve but unfortunately this is not the case for a model so here the relation is quite a bit more complicated it's not given by a linear equation system and this results in the fact that this is an inefficient estimator of the model coefficients yeah.
1.
So this is what we can do now this inversion as mentioned before is infinite so it never stops and we will never have infinitely many observations in a time series so we will never have infinitely many observations in a time series.
So basically what we do is we have that inversion down to the first observation in the time series and then we kind of stop it so we have a vector one to the power of a times a zero which is an unobservable quantity in practice so what do we do we assume that this unobservable innovation term at time zero equals zero it's expectation so kind of we commit a small error.
So this will play a very labeled role on the estimation of the coefficients obviously so we're pretty safe to do so.
Of course this time series length is limited if the time series process is a bit more complicated the influence of that condition here could be more severe and be more important but in many practical cases it's relatively unimportant yes so the method is actually implemented in the aria function in a.
So you supply a time series you define the order of the model and then if you require method equals ass so it will do exactly what is described on this slide it's not recommended to do so usually because the default so it's recommended to rely on the default method here which is ass me so it uses these conditional sum of squares estimate starting values for a maximum likelihood estimation.
So.
For the time series process with mean zero if it's a non shifted process which has some covariance matrix we which depends on the model coefficients respectively if we have the shifted process what is added is the global mean a here as the common mean for all the observations.
Yeah so we can derive the likelihood function and parameters can be estimated from this approach as mentioned without giving any further details as to how this happens it's implemented in the primo function and this is what we rely on okay so well let's go for an example it's honestly quite a bit more difficult to find good.
Wow mau5 org.
1 00.
1 00.
1.
So i start with fitting the time series straightforward so it's the values of the at to bond and actually i just look at the difference values so this is a.
We obtain the coefficients negatives coefficient for beta 1 it's standard error then we also obtain the intercept which is the global mean.
It's estimated slightly negative very close to zero and the standard error is at 0 04 so if we do a confidence interval for the coefficients then what is special here is that the confidence interval for the intercept thus contain the value of zero so the intercept the global mean is not significantly different from zero.
So if we perform variable selection so to speak and try to in a stepwise manner omit all the terms which are non significant then we cannot make the interest here in fact if we do look at the time series the mean is indeed close to zero so also from this perspective it may seem justified so the significance decision is of course.
1.
So it's well justified to actually omit the global mean in that particular series and we should modify the model that we fit in the sense that we omit the global mean and this is by this argument so the include mean argument has to be set to false yeah so it doesn't change the.
The one very much it's a very slight change only which may be also a good sign we also notice because we are given the air that the air of this model without the global mean is indeed lower than the one where we have the global mean so also in that since it's justified to remove the mean so we have found the model for that.
Now of course we also have to verify if the model fits time series of residuals is what we look at no objections basically against this of course yeah you're picky you could say well that's outliers here a few of them and this catches my attention so it's not entirely valid.
So in terms of the dependency.
So dependency around colloquially spoken so none of the estimates goes beyond the confidence span so the ma1 process indeed seems to be able to capture the dependency which there is in this time series so we seem to be fine then next we usually also inspect a normal plot of the residuals and now yeah.
We do not only notice these outliers so that's the three positive outliers which i pointed out in the residuals before so they have clearly longer tails than the normal distribution allowed for but there is some general long tails in these residuals or in this time series of residuals.
The question is how severe that is in this particular case.
The.
1 00.
1.
So it's not overly disturbing however there is one point which is more important so if the distribution is not correct which it is here symmetric but long tailed then this affects the standard errors so these and of course.
So we have to be careful when we do confidence intervals because they rely on the standard errors which are biased in this case so in fact our decision to omit the mean of the series has to be questioned again in the light of the residual analysis however here in this particular case so we're pretty serious.
So it's certainly that the omission of the global mean is the right decision although we do have some doubts as to whether this is correct or not so the standard error and the confidence interval is correct or not but i mean we're safe here in the decision so it's most likely fine yes yes.
1 00.
1 00.
Because.
So the effect on the standard errors is fundamentally the same but if we do have volatility clustering it would just mean that an ma1 is not the right model to actually fit to that time series of the returns so the concern of this on the results of the ma1 is a bit less on the technical side i'd say the results are still.
So we'll get there at some point in the future so this will become more clear yeah to conclude it's maybe not the best example because some things do not work.
So there's also the current innovation it again innovations did white noise mean zero they're true innovation so independent of anything.
We can write an may a normal pm sorry pretty compact by using the characteristic polynomials so we have the characteristic polynomial phi for the a part which is applied to it we have the characteristic polynomial theta which is applied to the innovations.
We can write them out we have the model coefficients in the characteristic polynomial the characteristic polynomial they determine the properties of the process so if we don't have roots on or within the unit circle in the phi polynomial then the process will be stationary if the roots are all outside the.
Of the unit circle in the theta polynomial then the process is additionally invertible and that is what we aim for it's basically what is on this slide here stationary is determined by the a part we have to or we can verify it by the characteristic polynomial convertibility is determined by the a part yeah and under the condition process will be invertible.
If we have to worry about when we fit models to data.
And ram pm in its pure form will also always have mean zero and if we want to make it a practically useful tool we have to go to the shifted process again where we introduced global mean and by default in the a function the shifted process will be considered in terms of properties in the dependency structure.
So we just go for an example here and i will try to explain what happens so well the dependency structure in its characteristics is now a mixture of the properties of an a and the properties of an a so this is what we have so let me try to explain so here we have an area 2 1 so we have.
This dependency in the a part we have this dependency in the a part so what we do expect in the act and the pack is each a superposition of the properties of a and a so we expect to have a superposition of next financial decay and the cut off.
So here what we know is that there should be a cutoff after lag 1 so be careful this is lag 0 so this is lag 1 indeed the magnitude of the articulation coefficients becomes much smaller than the.
So these two are indeed bigger a bit bigger in magnitude than what follows so once you know it that cutoff which is superimposed is also visible as to how this mixture between cutoff and decay is obviously depends on the.
On how the process is you can have an a moth with a very strong a part and only a marginal a part so for instance if you make this coefficient very small so if you make it 0 001 then obviously the contribution of the moving average part becomes almost neglect able still technically it is.
And then an arm up pm model and then of course the properties of act and tack will lean much more on the side of the one of a pure a process and vice versa you can also have an arm up pm which is much more on the side of an may if the coefficients are likewise or both can be equally strong so here everything.
1 00.
1.
So i mean it's just for saying that this may very well happen in practice i mean once you know the truth it's always easy.
1.
Which exists in practice process which has volatility clustering for example that's a nonlinear process and that cannot be approximated by an arm of pm how matter no matter what kind of orders you choose so there exists time series in reality where this doesn't actually help i mean it's a nice theoretical result.
So the magnitude of a plus a so the number of coefficients which you fit which you estimate is limited by the amount of data which you have access to so this is a theoretical result in a theoretically oriented course there's usually quite a bit of focus on that because one can show how you can.
But practically potentially doesn't help that much what is important for practical work however is and there is a slight dedicated to it but still it fits here that usually arm or pm allows for parsimonious models in some sense you could say well arm or pm is also a bit less convenient than arm or pm.
Then a pure a because in a pure a the interpretation is so much more straightforward in arm or pm again due to the a part that's not given however the advantage of arm or pm is usually that we get by with estimating fewer coefficients than if we fit it a pure a model.
It's also to some extent visible here so here we can go with an arma21 which is also the correct process if we interpreted it differently and went for an a which for some reasons is always a bit attractive with most likely have to use an ar4 given act and pack which is only one coefficient more but already one coefficient more so that's very typical.
If you want to replace arm apex by a pure process it requires more coefficients to reach the same result okay so it's time for the break i'm aware that the bell is not working correctly so i had an eye on the clock so let's continue in 15 minutes.
Yeah.
Egg delian.
So.
Okay let's carry on with.
So basically it's the third time that we have almost the same slide so probably gets boring a little bit so i can go over that pretty fast because in the fundamentals we fit an arm or model that's not different from pure a or pure a well we must only do so to stationary series that's important yeah we have to guess the orders from the.
So sometimes these decisions as what model to use are less linear than this program here suggests but sometimes also arise from just proceeding and then verifying the results i mean that's just how modelling works in practice.
1 00.
3.
We try relatively simple models and then just pick the best ones best one in terms of air so that's the easiest decision because it can be automated or on the basis of the residuals or how the coefficients and their confidence intervals or it center so i will get back to this later but let's first look at an example here which is about the.
So i'm not a specialist for this so you can read here or also wikipedia about the north atlantic oscillation is so it has to do with the pressure difference of the typical high which you have over the source and the low which is somewhere else and how that pressure difference is i think that's that's what it is but as mentioned i'm not a specialist.
So this is the series that we have so it's actually quite long series that we have here it's monthly data in fact however there was no seasonal effect and the series looked stationary so there was no trend either and this is just act and the act that we have what it shows is that.
Well we have a very clearly significant estimate at lag one and then well there are a few more which go beyond or scratch the confidence bands and in that situation so it looks like a cutoff in both act and tack it does not look like much of an exponential decay maybe if anyway.
So i think the most natural process to try here is indeed the normal one one because we have to cut off here at lag one we have to cut off there at lag one it's a simple model and it's yeah so maybe the one to go with especially also because if you do an arm or one one so these.
So we say this is cut off this next financial decay then these are to some extent disturbing because they shouldn't exist in practice conversely if we do an ar1 it's kind of the same then this one it center so that would be disturbing because it shouldn't exist because there is just a pure cut off.
So it's kind of the natural choice here however before we fit there's some theory about estimation of the coefficient it's along the very same lines as to what we have seen before in the a models yeah so we do invert the arm or pm express it in terms of an a infinity and then we have a.
So we can derive the likelihood function and then simultaneously estimate the global mean the model coefficients and the innovation variance so this is what happens so i go relatively quickly here because i've already explained it today so it's really the same as we have before the remarks here same as we have heard before.
So emily truly only exists on the gaussian distribution so yeah if we have reasonable deviations from the normality assumption tends to still work with a loss of efficiency and a bias in the standard errors.
So we can i think very well live with a model that does not have a global mean so that's kind of the next step which i take so i remove the global mean because it's insignificant and i can very well accept from a practical viewpoint that it shouldn't exist i refit the model what i notice is that the ma1 coefficient is not significant either.
So confidence interval very clearly contains a value of zero and so this is the basis for omission of this coefficient as well further remark so i should have mentioned this before air also improves when i remove the mean when i do remove the ma1 coefficient then i remain with an ar1 model this coefficient is now much more.
The.
So we can also keep kind of the bigger picture in terms of the residuals in mind when we take the decision so it's just important to know when we go for the residuals i just have the residuals for phase two which is the ar1 here.
1 00.
It's not very strong either so it's actually quite a low value it's on point zero eight quite a long time series so the confidence pants are actually relatively small when we do a leon box test for the first 24 lakhs so the null hypothesis of independence is accepted somehow also indicating that this is maybe not too severe and maybe the most important argument.
1 00.
1.
1.
It's convenient and it can also be useful the danger is just that you can always apply it to any series of course the function will always find a best fitting arm apex model but there is no guarantee that this arm apex is actually a.
So this is the danger and this is why kind of this all prior preference for the manual approach where you inspect things and yeah study the series it's also kind of a vote to against using kind of this black box procedure rather than doing some careful modelling of.
The time series which i think is well or often yields together results so that's why i recommend always complementing it by visual inspection of the time series and these considerations on the dependency plots as well as the risk it was yeah some details about the information criteria so.
So here we have the definition in terms of an a map model so air is built on minus two times the log likelihood function as a goodness of fit measure so you want to be this as small as possible you want the model to fit as well as possible to the data bigger models have an advantage over smaller models because they have more.
The penalty is given by the number of estimated parameters in the model so it's a plus a the order or orders of the model then we have a.
Which is one if we use a global mean or zero if we don't and then the plus one is for the innovation variance which we also have yeah so this should in some sense also be as low as possible and in particular the combination of the two so business of fit plus model complexity should be as low as possible so this is what we'd call the best suited model.
Now for small samples and especially in the outer rico functions or in the methodology programmed by hin man in the forecast package they mostly use this corrected version of air it's kind of a small sample correction which is applied it's relatively complicated here but they suggest or they also show in some work that they.
So yeah maybe if you use that methodology it just makes sense to accept this and rely on what they use as the default criterion unless you know better than them i would say.
So here i copied the algorithm for this alto oriya function so the usage of that alto oriya function is actually relatively complex i mean you can just call it and it will do something you can also call it by setting the arguments accordingly so this already requires quite a bit of insight.
So we have to take the number of differences which are taken before the modelling actually starts so the difference in the series is considered before this even starts.
1.
So the global mean is included also depends on the number of differences but typically it is the best model of these then becomes the current model and then some alterations on a and a are done and the global mean is also excluded.
So i hope it's still the current version but actually i'm not 100 certain that it is even if it was it doesn't help you in terms of any future.
1.
1.
1.
Whether this is a violation of says verity or not might be might not not so clear to say however if someone says well no i don't accept the assumption of a constant mean here i can hardly object so i'm what i do here i apply out to a rim afterwards or.
I will apply out to are a so yeah i operate under the assumption that also the mean is constant here and this is a stationary series yeah we have act and pack we can try to guess the order of a model that we would fit here in terms of a cutoff here in the act not much is visible so here i think we could argue well this is a pure.
0 00.
So you're welcome to mention that as well but i'd say 2 1 is maybe what beats try yeah so but now rather than trying the area 2 1 i use the outer rim function.
So long transformations are by the way so this is also something which maybe be mentioned at this point in time so the function in the meantime at least can also decide about transformation so it can't even automatically decide on not only log or not but also box cox transformation and it can do so automatically yes and you can switch that on.
So.
So you could do so.
It offers.
So this is also what i suggested however the a order of three so that would imply the cutoff is after like three in the act and that's by no means visible in the act so we would never have fitted an area two three had we done the selection by inspecting act and place and as mentioned so it's very typical that this happens if you apply this function.
So models are most often not logical in comparison what you obtain by this hand selection another remark concerns are the significance assessments of the coefficients so in fact i mean most of the coefficients are significant so in the sense that the confidence interval point estimates.
The value of zero except for this one here also the second a coefficient however you don't really have the option in a to remove that coefficient so this is not something which is done in time series analysis so if you say the a order is three in this area model then there's no option to switch off.
So if you want to reduce the model you can do this on the basis of significance decisions of course by hand but then you always have to eliminate the highest coefficients first so actually only when the highest coefficient in the either the a or the a part is non significant then you'd reduce the model else you wouldn't do.
1.
So maybe the assumption of a stationary process to start with is indeed justified and this is maybe the simpler consideration whether debating over this time series process whether the mean is constant or not so what we did ultimately was work with a stationary model and indeed that model.
Well then things are perhaps fine still the discussion can be worth it in practice but i just want to show the other side as well so it seems to work so it could be okay.
0 25.
0 08.
So it's inadequate in some sense but then on the other hand it's also not so easy to come up with a process that removes this dependency you have to increase the model orders quite a bit estimate many more coefficients which also brings some disadvantages so to some extent.
So that's how modelling works so you always have this trade off in the complexity of the model and if the larger model does not clearly bring advantages and practical advantages this is not in just removing this but also practical advantages then when often proceeds with the smaller model okay so that's the end of this.
So let me try to explain.
We have two predictors.
So for that particular instance is influenced by the value of the predictor series at the same time and then we have the error also at the same time in fact this is a regression problem this is a pure regression problem and if but only if the time series of errors is independent.
Then we can use old regression methodology to deal with that problem and be all fine indeed i assume i'm pretty certain that if you have done regression before you've actually worked on data which indeed were a time series regression and this basically goes unnoticed so these are just all.
So we do the standard residual plots in a took the nos comp normal plots scale location the leverage plot and that's kind of it.
Now this is a statement for actually doing so when you perform regression with time series or more precisely when you perform regression with data which have this time series structure or which have a sequential nature then verifying the independence assumption for the error terms separately makes sense because it often turns out to be a.
So if the independence assumption for the errors is violated so if we do have correlation in the errors then the point estimates and also the fitted values they are still unbiased however.
So the inference results are invalid.
This is what is often more practically relevant than the loss in efficiency and the race is generalized least squares procedures so tvs rather than old which solves the issue which can take the correlation or act correlation of the errors into account when estimating regression models and this is what we're gonna discuss.
So what we have is the time series about global warming more generally spoken it's global temperature anomalies and the global temperature anomaly is just the deviation of the yearly.
Well the data i think also more recent data can be found on this website or yeah so in the data set that's just what i worked with if you want to exactly replace these or recreate my.
1 00.
So we want to perform a time series decomposition on one hand where we have a trend plus a remainder now the question is how the trend could be so we just try with a linear trend because that seems to fit pretty well and then of course we may be interested in giving a trend.
The.
So yeah we will consider that next week so have a good week see you next monday for continuing this discussion.