Module upsilon_analysis

Analysis of upsilon resonances in dimuon events in CMS Open Data.

upsilon_analysis.build_dataframe(input_file, no_quality=False, y_min=0, y_max=1.2, pt_min=10, pt_max=100)[source]

Opens the given file in an RDataFrame.

Parameters:
  • input_file (str) – The file from which to take the Events TTree.
  • no_quality (bool, optional) – Wether to skip muon quality cuts (see readme).
  • y_min (float, optional) – Minimum abs(y) for the dimuon system.
  • y_max (float, optional) – Maximum abs(y) for the dimuon system.
  • pt_min (float, optional) – Minimum pt in GeV/c for the dimuon system.
  • pt_max (float, optional) – Maximum pt in GeV/c for the dimuon system.
Return type:

ROOT.RDataFrame

Makes a ROOT.RDataFrame with the given input, applies cuts common to all analysis tasks by calling Filter and returns the result. Nothing will be run as only lazy action are requested.

The cuts performed (actually “booked”) are:

  • selection of only events with exactly two opposite charge muons;
  • quality cuts (see readme) unless no_quality is True;
  • invariant mass cut between 8.5 and 11.5 GeV/c^2;
  • dimuon system rapidity and transverse momentum cuts defined by y_min, y_max, pt_min, pt_max.

In the process, the columns dimuon_mass, dimuon_pt and dimuon_y are created and populated.

Warning

This function uses ROOT.gInterpreter.Declare to define some global C++ functions, namely:

  • float m12(RVec<float>, RVec<float>, RVec<float>, RVec<float>)
  • float pt12(RVec<float>, RVec<float>, RVec<float>, RVec<float>)
  • float y12(RVec<float>, RVec<float>, RVec<float>, RVec<float>)

If you define something with this names in the global namespace an error will arise, but ROOT may actually not raise any exception in Python.

If the function is run more than once, the C++ functions are not redefined, so multiple calls to this function will work without issues.

upsilon_analysis.book_histograms(df, mass_bins=100, y_min=0, y_max=1.2, y_bins=2, pt_min=10, pt_max=100, pt_bin_width=2, y_bins_list=None, pt_bins_list=None)[source]

Pepares mass histograms to be made by the RDataFrame.

Parameters:
  • df (ROOT.RDataFrame) – The RDataFrame to work on.
  • mass_bins (int, optional) – The number of bins for the mass histograms.
  • y_min (float, optional) – Minimum for abs(y) binning.
  • y_max (float, optional) – Maximum for abs(y) binning.
  • y_bins (int, optional) – Number of abs(y) bins.
  • pt_min (float, optional) – Minimum, in GeV/c, for pt binning.
  • pt_max (float, optional) – Maximum, in GeV/c, for pt binning.
  • pt_bin_width (float, optional) – Width, in GeV/c, of the pt bins (see utils.pt_bin_edges).
  • y_bins_list (Iterable[tuple[float, float]]) – If provided, y_min, y_max and y_bins are ignored and the tuples in this iterable are used as bins.
  • pt_bins_list (Iterable[tuple[float, float]]) – If provided, pt_min, pt_max and pt_bin_width are ignored and the tuples in this iterable are used as bins.
Return type:

dict[tuple[float, float], dict[tuple[float, float], TH1D]]

Builds a dictionary whose keys are tuples (y_low, y_high), i.e. the edges of a rapidity bin, and the values are dictionaries whose keys are tuples (pt_low, pt_high), i.e. the edges of a transverse momentum bin, and the values are histograms of the invariant mass distribution within that rapidity and transverse momentum bin. In other words this function returns an object like

{(y_low_1, y_high_1): {(pt_low_1, pt_high_1): TH1D, ...}, ...}

The histograms are actually RResultPtr<TH1D>, this means they are “booked” using the Histo1D method of the given RDataFrame df, but are not actually built because that is a lazy action.

The bins are chosen by the parameters; beside those, a pt bin that includes all pt values is present for each y bin, and a y bin that includes all y values is also present.

upsilon_analysis.fit_histograms(histos, output_dir=None, vv=False)[source]

Automatic mass histograms fitting.

Parameters:
  • histos (dict) – Dictionary of dictionaries of histograms (see below).
  • output_dir (str) – Output directory for the PDF with the histograms; if not given or None no PDF is produced.
  • vv (bool) – Very verbose mode.
Return type:

dict[tuple[float, float], dict[tuple[float, float], utils.FitResults]]

Fits the histograms histos returned by book_histograms, saves the plots to “mass_histos.pdf” and returns the results of the fits in a dictionary of dictionaries similar to histos.

The fit function used is the sum of three Gaussian functions for the resonances plus a linear parametrization for the background.

The results of each fit are returned as a utils.FitResults.

upsilon_analysis.build_cross_section_hist(fit_results, luminosity=1, efficiency=1)[source]

Builds a dσ/dpt vs pt histogram from a collection of fit results.

Parameters:
  • fit_results (dict[tuple[float, float], int]) – Dictionary of the occurrences in each bin.
  • luminosity (float, optional) – The total integrated luminosity for calculating real cross sections (the units must be added manually to the axis label).
  • efficiency (float or dict[tuple[float, float], float], optional) – The detection for calculating real cross sections. If a scalar is given, it will be used for all bins; otherwise it must be a dict like fit_results with an efficiency value for each bin.
Raises:

RuntimeError – If the bins are invalid or overlap.

Return type:

ROOT.TH1F

The argument fit_results should be a dictionary whose keys are the pt bin limits, as a tuple, and whose values are the occurrences of the resonance in that pt bin. In other words, it should be like

{(pt_low_1, pt_high_1): n_occ_1, ... }

The results will be a TH1F whose bins are given by the keys of fit_results, and whose bins’ contents are the respective numbers of occurrences divided by the bin width. This will be proportional to dσ/dpt.

If luminosity and/or efficiency are given, each bin will be divided by them. This way one can get a real cross section measure. Remember to change the y axis label to include the correct units.

If the given bins overlap, a RuntimeError will be raised.

Note

If the bins are non-contiguos (i.e. there is a “hole”), a bin with zero content will be made (this is important if you intend to fit the histogram).

Note

The acceptance may, and should, be included in the efficiency.

upsilon_analysis.build_cross_section_graph(fit_results, luminosity=1, efficiency=1)[source]

Builds a dσ/dpt vs pt graph from a collection of fit results.

Parameters:
  • fit_results (dict[tuple[float, float], int]) – Dictionary of the occurrences in each bin.
  • luminosity (float, optional) – The total integrated luminosity for calculating real cross sections (the units must be added manually to the axis label).
  • efficiency (float or dict[tuple[float, float], float], optional) – The detection for calculating real cross sections. If a scalar is given, it will be used for all bins; otherwise it must be a dict like fit_results with an efficiency value for each bin.
Raises:

RuntimeError – If the bins are invalid or overlap.

Return type:

ROOT.TGraphErrors

The argument fit_results should be a dictionary whose keys are the pt bin limits, as a tuple, and whose values are the occurrences of the resonance in that pt bin. In other words, it should be like

{(pt_low_1, pt_high_1): n_occ_1, ... }

The results will be a TGraphErrors where the x values are defined by the centers of the pt bins, the x errors are defined by the pt bins’ width, the y values are the numbers of occurrences divided by the pt bin width and the y error is estimated as the square root of the number of occurrences divided by the pt bin width. This way the y values will be proportional to dσ/dpt.

If luminosity and/or efficiency are given, each y value will be divided by them. This way one can get a real cross section measure. Remember to change the y axis label to include the correct units.

If the given bins overlap, a RuntimeError will be raised.

Note

The acceptance may, and should, be included in the efficiency.

Note that everything available here is defined in one of the submodules, mostly core.