Module upsilon_analysis¶
Analysis of upsilon resonances in dimuon events in CMS Open Data.

upsilon_analysis.
build_dataframe
(input_file, no_quality=False, y_min=0, y_max=1.2, pt_min=10, pt_max=100)[source]¶ Opens the given file in an
RDataFrame
.Parameters:  input_file (str) – The file from which to take the Events TTree.
 no_quality (bool, optional) – Wether to skip muon quality cuts (see readme).
 y_min (float, optional) – Minimum abs(y) for the dimuon system.
 y_max (float, optional) – Maximum abs(y) for the dimuon system.
 pt_min (float, optional) – Minimum pt in GeV/c for the dimuon system.
 pt_max (float, optional) – Maximum pt in GeV/c for the dimuon system.
Return type: ROOT.RDataFrame
Makes a
ROOT.RDataFrame
with the given input, applies cuts common to all analysis tasks by callingFilter
and returns the result. Nothing will be run as only lazy action are requested.The cuts performed (actually “booked”) are:
 selection of only events with exactly two opposite charge muons;
 quality cuts (see readme) unless
no_quality
isTrue
;  invariant mass cut between 8.5 and 11.5 GeV/c^2;
 dimuon system rapidity and transverse momentum cuts defined by
y_min
,y_max
,pt_min
,pt_max
.
In the process, the columns
dimuon_mass
,dimuon_pt
anddimuon_y
are created and populated.Warning
This function uses
ROOT.gInterpreter.Declare
to define some global C++ functions, namely:float m12(RVec<float>, RVec<float>, RVec<float>, RVec<float>)
float pt12(RVec<float>, RVec<float>, RVec<float>, RVec<float>)
float y12(RVec<float>, RVec<float>, RVec<float>, RVec<float>)
If you define something with this names in the global namespace an error will arise, but
ROOT
may actually not raise any exception in Python.If the function is run more than once, the C++ functions are not redefined, so multiple calls to this function will work without issues.

upsilon_analysis.
book_histograms
(df, mass_bins=100, y_min=0, y_max=1.2, y_bins=2, pt_min=10, pt_max=100, pt_bin_width=2, y_bins_list=None, pt_bins_list=None)[source]¶ Pepares mass histograms to be made by the
RDataFrame
.Parameters:  df (ROOT.RDataFrame) – The
RDataFrame
to work on.  mass_bins (int, optional) – The number of bins for the mass histograms.
 y_min (float, optional) – Minimum for abs(y) binning.
 y_max (float, optional) – Maximum for abs(y) binning.
 y_bins (int, optional) – Number of abs(y) bins.
 pt_min (float, optional) – Minimum, in GeV/c, for pt binning.
 pt_max (float, optional) – Maximum, in GeV/c, for pt binning.
 pt_bin_width (float, optional) – Width, in GeV/c, of the pt bins (see
utils.pt_bin_edges
).  y_bins_list (
Iterable[tuple[float, float]]
) – If provided,y_min
,y_max
andy_bins
are ignored and the tuples in this iterable are used as bins.  pt_bins_list (
Iterable[tuple[float, float]]
) – If provided,pt_min
,pt_max
andpt_bin_width
are ignored and the tuples in this iterable are used as bins.
Return type: dict[tuple[float, float], dict[tuple[float, float], TH1D]]
Builds a dictionary whose keys are tuples
(y_low, y_high)
, i.e. the edges of a rapidity bin, and the values are dictionaries whose keys are tuples(pt_low, pt_high)
, i.e. the edges of a transverse momentum bin, and the values are histograms of the invariant mass distribution within that rapidity and transverse momentum bin. In other words this function returns an object like{(y_low_1, y_high_1): {(pt_low_1, pt_high_1): TH1D, ...}, ...}
The histograms are actually
RResultPtr<TH1D>
, this means they are “booked” using theHisto1D
method of the givenRDataFrame df
, but are not actually built because that is a lazy action.The bins are chosen by the parameters; beside those, a pt bin that includes all pt values is present for each y bin, and a y bin that includes all y values is also present.
 df (ROOT.RDataFrame) – The

upsilon_analysis.
fit_histograms
(histos, output_dir=None, vv=False)[source]¶ Automatic mass histograms fitting.
Parameters:  histos (dict) – Dictionary of dictionaries of histograms (see below).
 output_dir (str) – Output directory for the PDF with the histograms;
if not given or
None
no PDF is produced.  vv (bool) – Very verbose mode.
Return type: dict[tuple[float, float], dict[tuple[float, float], utils.FitResults]]
Fits the histograms
histos
returned bybook_histograms
, saves the plots to “mass_histos.pdf” and returns the results of the fits in a dictionary of dictionaries similar tohistos
.The fit function used is the sum of three Gaussian functions for the resonances plus a linear parametrization for the background.
The results of each fit are returned as a
utils.FitResults
.

upsilon_analysis.
build_cross_section_hist
(fit_results, luminosity=1, efficiency=1)[source]¶ Builds a dσ/dpt vs pt histogram from a collection of fit results.
Parameters:  fit_results (
dict[tuple[float, float], int]
) – Dictionary of the occurrences in each bin.  luminosity (float, optional) – The total integrated luminosity for calculating real cross sections (the units must be added manually to the axis label).
 efficiency (
float
ordict[tuple[float, float], float]
, optional) – The detection for calculating real cross sections. If a scalar is given, it will be used for all bins; otherwise it must be adict
likefit_results
with an efficiency value for each bin.
Raises: RuntimeError – If the bins are invalid or overlap.
Return type: ROOT.TH1F
The argument
fit_results
should be a dictionary whose keys are the pt bin limits, as a tuple, and whose values are the occurrences of the resonance in that pt bin. In other words, it should be like{(pt_low_1, pt_high_1): n_occ_1, ... }
The results will be a
TH1F
whose bins are given by the keys offit_results
, and whose bins’ contents are the respective numbers of occurrences divided by the bin width. This will be proportional to dσ/dpt.If
luminosity
and/orefficiency
are given, each bin will be divided by them. This way one can get a real cross section measure. Remember to change the y axis label to include the correct units.If the given bins overlap, a
RuntimeError
will be raised.Note
If the bins are noncontiguos (i.e. there is a “hole”), a bin with zero content will be made (this is important if you intend to fit the histogram).
Note
The acceptance may, and should, be included in the efficiency.
 fit_results (

upsilon_analysis.
build_cross_section_graph
(fit_results, luminosity=1, efficiency=1)[source]¶ Builds a dσ/dpt vs pt graph from a collection of fit results.
Parameters:  fit_results (
dict[tuple[float, float], int]
) – Dictionary of the occurrences in each bin.  luminosity (float, optional) – The total integrated luminosity for calculating real cross sections (the units must be added manually to the axis label).
 efficiency (
float
ordict[tuple[float, float], float]
, optional) – The detection for calculating real cross sections. If a scalar is given, it will be used for all bins; otherwise it must be adict
likefit_results
with an efficiency value for each bin.
Raises: RuntimeError – If the bins are invalid or overlap.
Return type: ROOT.TGraphErrors
The argument
fit_results
should be a dictionary whose keys are the pt bin limits, as a tuple, and whose values are the occurrences of the resonance in that pt bin. In other words, it should be like{(pt_low_1, pt_high_1): n_occ_1, ... }
The results will be a
TGraphErrors
where the x values are defined by the centers of the pt bins, the x errors are defined by the pt bins’ width, the y values are the numbers of occurrences divided by the pt bin width and the y error is estimated as the square root of the number of occurrences divided by the pt bin width. This way the y values will be proportional to dσ/dpt.If
luminosity
and/orefficiency
are given, each y value will be divided by them. This way one can get a real cross section measure. Remember to change the y axis label to include the correct units.If the given bins overlap, a
RuntimeError
will be raised.Note
The acceptance may, and should, be included in the efficiency.
 fit_results (
Note that everything available here is defined in one of the submodules, mostly
core
.