Comments on RÉSONAANCES: Gunpowder Plot: foiled

An arxiv entry from Bradley J Kavanagh, testing th...

2016-01-30T03:31:00.771+01:00

An arxiv entry from Bradley J Kavanagh, testing the ATLAS excess with various background models: http://arxiv.org/abs/1601.07330
He gets consistent local significances close to the ATLAS value, with all parametrizations, including the one from Davis et al.

It could help in this particular case, but there a...

2016-01-25T21:10:23.550+01:00

It could help in this particular case, but there are searches where it would be annoying to have all those bins. Consider this plot for example: https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/CONFNOTES/ATLAS-CONF-2015-043/fig_01f.png
Would it help to have tens of long black lines in the plot?

I agree this would help to guide the eye

2016-01-25T13:28:46.322+01:00

I agree this would help to guide the eye

Concerning zero-event bins, wouldn't it be app...

2016-01-25T12:30:49.432+01:00

Concerning zero-event bins, wouldn't it be appropriate to
plot one-sided error bars there (going to 1.8 or so?)?
That would help avoiding this confusion with bin counting.

As of today, I think the tweet was premature. The ...

2016-01-25T10:17:21.499+01:00

As of today, I think the tweet was premature. The background fitting procedure is not described carefully in the ATLAS conf note, which caused some misinterpretation by theorists. Once this is fixed, there is no big discrepancy between ATLAS and theorists' estimates.

"Partly changed my mind about http://arxiv.or...

2016-01-25T05:23:23.192+01:00

"Partly changed my mind about http://arxiv.org/abs/1601.03153 While v1 has serious errors, valid concerns about ATLAS diphoton background are raised"

Hey Jester, how about an update on that in the comments section?

Enough already with perfect (snow...) storms :-)

2016-01-23T18:16:52.818+01:00

Enough already with perfect (snow...) storms :-)

There will be a separate post next week. Brace for...

2016-01-23T17:47:25.881+01:00

There will be a separate post next week. Brace for a perfect storm :)

Jester, Could you elaborate on your latest though...

2016-01-23T16:28:25.766+01:00

Jester,

Could you elaborate on your latest thoughts regarding Jonathan's paper and the updated ATLAS curves?

Thanks!

Probably there would not be a significant preferen...

2016-01-20T13:56:45.998+01:00

Probably there would not be a significant preference for either hypothesis

If we take only the data points that are below 1 T...

2016-01-20T12:02:37.576+01:00

If we take only the data points that are below 1 TeV, then I wonder what happens to the significance of the excess if we fit them using:

a) The Atlas function

b) Using Jonathan paper's function.

@mfb I don't think the ordering rule is FC, ac...

2016-01-19T12:48:22.277+01:00

@mfb I don't think the ordering rule is FC, actually - it seems to be symmetric two-tail.

For n=1 observed, we see an interval of about 0.2 to 3.2 (see eg the right-most point on the plot).

Use the table I linked above

https://onlinecourses.science.psu.edu/stat414/sites/onlinecourses.science.psu.edu.stat414/files/lesson12/SmallPoisson01.gif

which is for cumulative Poisson, p(X<=x; \lambda), we find an upper limit for \lambda with x=1 and p=0.32/2=0.16, at \lambda=~3.2 and a lower limit at x=0 and p= 1. - 0.32/2 = 0.84 at about \lambda=0.2. This agrees well with the numbers from the plot.

See slide 20 of vuko.home.cern.ch/vuko/teaching/stat09/CL.pdf
for the relevant formulas.

Summarizing
===========

Quite a helpful thread, if I may summarise, we found 2 ostensibly big problems/questions about Davis et al (as well as issues in original post):

* Misinterpreted error bars, shich show confidence intervals from Poisson statistics (68% symmetric twotail). These were possibly interpreted somehow as 1\sigma errors in Gaussian likelihoods. This will be alright for large n by the CLT, but we aren't in that regime.

* Omitted empty bins, leaving only 27 data points rather than 36 with 9 empty. This occurred for at least the BIC, and possibly for the LLR (this is unconfirmed). I find this quite alarming.

Jonathan, are these big issues? Hoe do they impact your quoted significances? Do they impact your conclusions?

@Mario: mainly accelerator issues. More radiation ...

2016-01-19T02:10:36.915+01:00

@Mario: mainly accelerator issues. More radiation damage than expected, more heating of elements during injection and operation, magnets not as trained as expected, ...

@Jester: It is an application of the Feldman-Cousins method ( http://arxiv.org/abs/physics/9711021 ).

@Jonathan: Okay thanks, your comments here clarified that you do not understand the ATLAS error bars, or statistics with low event numbers in general.

sure

2016-01-19T00:29:52.502+01:00

sure

"they are not plotting the variance Sqrt[N]&q...

2016-01-18T23:34:48.001+01:00

"they are not plotting the variance Sqrt[N]"

I guess you mean

"they are not plotting the square root of the variance (Sqrt[N])"?

I agree re the error bars - they show a two-tail 6...

2016-01-18T23:20:01.004+01:00

I agree re the error bars - they show a two-tail 68% confidence interval for the expected number of events from Poisson statistics.

In the example,

for example the events with one event, have an upper error bar going up to 3.2, while it would reach only 2 with Poisson errors

With n=1 events observed, the two-tailed 68% upper limit for the expected is 3.2. See where 0.32/2=0.16 and x=1 occur in this look-up table:

https://onlinecourses.science.psu.edu/stat414/sites/onlinecourses.science.psu.edu.stat414/files/lesson12/SmallPoisson01.gif

it's at ~3.2. Or calculate it explicitly.

Jonathan, the ATLAS error bars are just Poissonian...

2016-01-18T21:59:33.144+01:00

Jonathan, the ATLAS error bars are just Poissonian statistical errors. More precisely, they are not plotting the variance Sqrt[N] (thanks god!), but instead they show a sort of 1 sigma confidence interval. The upper limit of the error bar is the mean value for which the Poissonian probability of observing N or less events is ~16%, and the lower limit of the error bar is the mean value for which the probability of observing N or more events is ~16% . For large N this reduces to the usual Sqrt[N] error bars. I guess this procedure has a clever-sounding name... maybe someone who knows statistics theory better than me could comment....

I'm not sure what you mean by "use the error bars on the ATLAS plot". Do you use their measured values and error bars to define a Gaussian chi^2 ? This is OK for high-N bins, but not for low statistics ones. What do you do for the bins where there is zero events and errors are not displayed? Do you ignore them, as suggested by dhrou ? I don't think that "any fit through discrete points like this would have some tension" once the statistical procedure is correctly defined. I think the correct procedure is to define the usual Poisson likelihood for all 36 bins. With this procedure, the ATLAS curve leads to only 1 sigma tension at the tail, and your green curve to even less than that.

I agree that the background fitting is not very carefully described in the ATLAS note, and that they should provide more details in the paper. However, I think there is no hint in the data (and no theoretical motivation) for a more complicated background shape. Based on your explanations, it seems to me that your conclusions are due to ignoring zero-event bins in your fit.

Jonathan, Can you elaborate on the ambiguity in t...

2016-01-18T21:57:02.173+01:00

Jonathan,

Can you elaborate on the ambiguity in the definition of the BIC? To me, it's quite clear that n=36 is correct. Why would it be correct to remove data from the chi-squared-like term in the BIC (-2\ln L) and reduce n in the term that penalizes parameters (k\ln n)?

Please can you clarify whether this is what you did, and if so, why? What are the values of -2\ln L and BIC with all 36 data points?

In the LLR test-statistic, q_\sigma, do you consider all 36 data points or only your smaller set of 27?

Jester, please forgive me what must be a stupid la...

2016-01-18T21:06:33.358+01:00

Jester,
please forgive me what must be a stupid layman's question as I'm trying to understand the conundrum. So, because we cannot estimate the background reliably from the theory (to which we must compare the observations to see if anything could be there that is not a part of the theory), we try to simulate it somehow. So we take a curve with multiple parameters, resolve the parameters by fitting it to the regions where we know there's nothing and see what it would yield (i.e. extrapolate) to the regions where we don't know for sure. This approach, taken naively causes two immediate questions:
1. For the "fitting" regions - do we know for sure that nothing is there?
2. And for the curve - there should be something about it that could still derive from the theory, or general truths - e.g. no sudden twists, etc. Otherwise, couldn't we find a parametrized curve to fit pretty much any observed data?
Thanks!

ohh, someone just got owned!

2016-01-18T19:29:59.200+01:00

ohh, someone just got owned!

Jonathan, Your point makes a lot of sense, as a m...

2016-01-18T18:45:58.982+01:00

Jonathan,

Your point makes a lot of sense, as a matter of principle. If the significance is over-sensitive to the details of the processing algorithms, the signal is likely to be inconclusive.

The hope is that the uncertainty will start to clear up by the end of the year, although this may not be a sure bet at this point. More data in uncharted territory may also boost the "noise".

Ah yes I see your reasoning. However the low-energ...

2016-01-18T17:44:24.376+01:00

Ah yes I see your reasoning. However the low-energy bins have uncertainties which are at least twice the size of the Poisson ones (for example the events with one event, have an upper error bar going up to 3.2, while it would reach only 2 with Poisson errors). So I think you should use the error bars on the ATLAS plot instead, in which case you get something closer to 1.8 sigma tension.

Obviously some tension remains, but I don't think this is entirely a surprise, especially since the points are discrete here. Indeed any fit through discrete points like this would have some tension.

Also just to respond to dhrou as well. We also tried the BIC with n = 36, and you get a similar conclusion regarding the BIC. Originally in fact we did use that number, but its not clear with the definition of the BIC which one to use. Rest assured that we spent several weeks just on the numerics for this paper, and that we have done a lot of tests on our results, including cross checking them using two completely different pieces of code.

In any case it is up to you to decide what conclusions you take from our paper. Personally I do not trust a result which varies so much just with a small change in the empirical background. However this is of course my own view.

Jonathan

I guess there is a basic misunderstanding in this ...

2016-01-18T11:22:31.168+01:00

I guess there is a basic misunderstanding in this paper, which is that the fact that empty bins (in the plot showed by Jester) are significant is ignored.
Bottom right of page 4 says "BIC = −2lnL+klnn, where k is the number of parameters in the model and n = 27 is the number of data-points." n=27 is indeed the number of points visible on the plot, however one should also consider the empty bins with zero event. By eye, there are actually 36 bins, so 36 data points. This is the reason why the proposed function clearly overshoots the data at high mass, as jester you spot by quoting the expected and observed integral.

Hi Jonathan, thanks for the comments. It'll be...

2016-01-18T10:52:24.312+01:00

Hi Jonathan, thanks for the comments. It'll be indeed great to sort out the details of the background fitting in ATLAS.
For our discussion, let's start with the tail. The number I quoted is purely statistical error. Integrating your best fit curve from 790 to 1590 GeV I'm getting the prediction of 33 events (up to digitization error). The number of observed events in that part of the tail is 17. The Poissonian probability of 33 fluctuating down to 17 is 0.16%, which corresponds to 3.2 sigma. My claim is that your best fit screws up the tail more than it improves the 750 GeV region.

Hi Jester, thanks for the interest! What ATLAS p...

2016-01-18T10:26:16.273+01:00

Hi Jester, thanks for the interest!

What ATLAS plot as their background and what they actually use is a bit confusing. If you look at the note they say explicitly that they set k = 0 in their background fit. However when we fit their function to the data we got what we showed in figure 1 i.e. that the background with k=1 looks like the plot in the ATLAS note.

So it depends on if you believe their plot or the text of the note. I have a feeling they just accidentally plotted the fit with the k=1 component, but who knows.

Also remember the function we picked is arbitrary, just as the ATLAS one is. We could have picked one which didn't have such a steep tail. Though I do not think the best-fit form of our function over-shoots the tail, since it still falls within the error bars. In any case this is reflected in the quality of the fit and in the likelihoods we plot in figure 2. What are the errors bars you used for your claim of 3 sigma tension? By eye this seems wrong.

So our point is a more general one than just picking a new function. We were trying to show that a result which depends so sensitively on the choice of empirical background function should not be trusted. We were not necessarily suggesting that we had invented a better function for the background. For example this would not have happened for the Higgs search, since the background is well constrained both above and below the resonance.

Also, if you would like to email me the parameter ranges for nuisance parameters you used in your profile likelihood fit I would be happy to help understand why you could not reproduce our significances.

Thanks,
Jonathan Davis