Find precedents of cells with formulas in Google Sheets using Apps Script — Part 1

Enumerate a range in Google Sheets using Google Apps Script and use that to find precedents of cells that contain a formula.

Find precedents of cells with formulas in Google Sheets using Apps Script — Part 1
Find precedents of cells with formulas in Google Sheets using Apps Script — Part 1

Backstory

With all this talk about Universal Analytics going away from Google Analytics, i remembered the heat map feature they had, where they showed where visitors interacted the most on a website — gave me a cool idea on if we could determine something similar in Google Sheets, whereby we could identify which cells were being relied upon the most and used by other cells (i.e. precedents).

precedents — cells that are referred to by another cell, function or as part of a formula (not to be confused with dependents).

While Microsoft Excel already has a way to display the relationships between formulas and cells by helping users trace precedents & dependents, Google Sheets doesn't have an in-built way to do that — neither does the api or the built-in apps script functions 😬

Approach

Fortunately, Apps Script provides a very helpful and convenient way to quickly find cells with formulas and get them either in the A1Notation format or R1C1 — that's right, i'm talking about getFormulasR1C1.

Here's an example: say if you have a sheet which contains formulas in the following order —

sheet with formulas
sheet with formulas

The output of the typical getFormulas function would look like so —

output of getFormulas function

This may look overtly complex in the begging but the structure (not the actual values itself) already provides almost everything we need — almost 😉

The crown jewel of this solution is really the way to enumerate a range — what this means if given an input range (not just a cell reference) say: A1:A5, how to get all the cells associated with it (i.e. A1, A2, A3, A4 & A5).
This should also handle cases like —

  • when the range references are not from the same sheet
  • when the ranges are open-bound (say: A:A)

Also, here's what the output from the very-needed getFormulasR1C1 looks like –

output of getFormulasR1C1 function

As you might've guessed (or maybe, not), this still has an odd problem — the row and column indexes have negative values 😬 and what that means is that the cells being referred to are relative to the cell where they reside (translation: the cell that holds another cell thinks that the entire world revolves around it and it alone) and yet, they still help.

Once the ranges have been enumerated, its really a matter of mapping the precedent cell to the cell containing the formula along with handling all the cases as mentioned above.

Code

you can find the entire codebase from my GitHub repository here or make a copy of this spreadsheet (also copies the Apps Script file associated with it ✌🏽), i'm also sharing it below for quick reference —

const saveToDriveAsJson = false;

const ss = SpreadsheetApp.getActiveSpreadsheet();
const activeSheet = ss.getActiveSheet();

function getPrecedents() {
  const rawCellData = scanCurrentSheet();
  let result = {};
  rawCellData.forEach(cellData => {
    let rawRangesR1C1 = cellData.rangesR1C1;
    rawRangesR1C1.forEach(range => {
      let enumeratedData = enumerateRange(range, cellData.rowIndex, cellData.columnIndex);
      result[enumeratedData.cell] = Object.keys(result).includes(enumeratedData.cell)
        ? [...new Set(result[enumeratedData.cell].concat(enumeratedData.precedents))]
        : enumeratedData.precedents;
    });
    let rawCellR1C1 = cellData.cellsR1C1;
    rawCellR1C1.forEach(cell => {
      let cellA1Notation = convertCellsFromR1C1toA1Notation(cell, cellData.rowIndex, cellData.columnIndex);
      result[cellA1Notation.cell] = Object.keys(result).includes(cellA1Notation.cell)
        ? [...new Set(result[cellA1Notation.cell].concat(cellA1Notation.precedents))]
        : cellA1Notation.precedents;
    });
  });
  console.log(JSON.stringify(result, null, 2));
  saveToDriveAsJson ? DriveApp.createFile(`${ss.getName()} — ${activeSheet.getSheetName()} — graph.json`, JSON.stringify(result, null, 2), MimeType.PLAIN_TEXT) : null;
}

function scanCurrentSheet() {
  const formulasR1C1 = activeSheet.getDataRange().getFormulasR1C1();
  let result = [];
  formulasR1C1.forEach((row, rowIndex) => {
    row.forEach((cell, columnIndex) => {
      let dict = {};
      if (cell) {
        dict["formula"] = cell;
        dict["rowIndex"] = rowIndex;
        dict["columnIndex"] = columnIndex;

        let formattedCell = JSON.parse(JSON.stringify(cell))
          .replace(/''/gmi, "'")
          .replace(/\\/gmi, "\\")

        dict["rangesR1C1"] = [];
        let rangesRegExPattern = `(?:[R|C]\\[-?\\d+\\]){0,2}:(?:[R|C]\\[-?\\d+\\]){0,2}`;
        try {
          cell.match(/!?(?:[R|C]\[-?\d+\]){0,2}:(?:[R|C]\[-?\d+\]){0,2}/gmi).filter(data => !data.includes("!") && data !== ":" ? dict["rangesR1C1"].push(data) : null);
        } catch (e) { }
        try {
          cell.match(/\b\w+!(?:[R|C]\[-?\d+\]){0,2}:(?:[R|C]\[-?\d+\]){0,2}/gmi).forEach(data => dict["rangesR1C1"].push(data));
        } catch (e) { }
        try {
          let moreRangeReferences = getMatchingRangeCellRef(formattedCell, rangesRegExPattern);
          moreRangeReferences.length > 0 ? moreRangeReferences.forEach(data => dict["rangesR1C1"].push(JSON.parse(JSON.stringify(data)))) : null;
        } catch (e) { }

        dict["cellsR1C1"] = [];
        let cellsRegExPattern = `(?<!:)R\\[\\-?\\d+\\]C\\[\\-?\\d+\\](?!:)`;
        try {
          cell.match(/!?(?<!:)R\[\-?\d+\]C\[\-?\d+\](?!:)/gmi).filter(data => !data.includes("!") ? dict["cellsR1C1"].push(data) : null);
        } catch (e) { }
        try {
          cell.match(/\b\w+!(?<!:)R\[\-?\d+\]C\[\-?\d+\](?!:)/gmi).forEach(data => dict["cellsR1C1"].push(data));
        } catch (e) { }
        try {
          let moreCellReferences = getMatchingRangeCellRef(formattedCell, cellsRegExPattern);
          moreCellReferences.length > 0 ? moreCellReferences.forEach(data => dict["cellsR1C1"].push(JSON.parse(JSON.stringify(data)))) : null;
        } catch (e) { }

        result.push(dict);
      }
    });
  });
  return result;
}

function getMatchingRangeCellRef(cell, regexPattern) {
  const sheets = ss.getSheets();
  let result = [];
  sheets.forEach(sheet => {
    let sheetNameFormat = sheet.getSheetName()
      .replace(/\\/gmi, "\\\\")
      .replace(/\//gmi, "\\\/")
      .replace(/\|/gmi, "\\\|")
      .replace(/\./gmi, "\\\.")
      .replace(/\+/gmi, "\\\+")
      .replace(/\*/gmi, "\\\*")
      .replace(/\?/gmi, "\\\?")
      .replace(/\^/gmi, "\\\^")
      .replace(/\$/gmi, "\\\$")
      .replace(/\(/gmi, "\\\(")
      .replace(/\)/gmi, "\\\)")
      .replace(/\[/gmi, "\\\[")
      .replace(/\]/gmi, "\\\]")
      .replace(/\{/gmi, "\\\{")
      .replace(/\}/gmi, "\\\}")
    let finalRegExPattern = new RegExp(`'${sheetNameFormat}'!${regexPattern}`, "gmi");
    let matchedReferences = cell.match(finalRegExPattern);
    matchedReferences?.forEach(data => result.push(data));
  });
  return result;
}

function enumerateRange(range, rowIndex, columnIndex) {
  let enumerated = [];
  const lastRow = activeSheet.getLastRow();
  const lastColumn = activeSheet.getLastColumn();
  const isDifferentSheet = range.includes("!");
  const rangeData = isDifferentSheet ? range.split("!") : null;
  range = isDifferentSheet ? rangeData[1] : range;
  let [startCell, endCell] = range.split(":");
  startCell = startCell.includes("R") && startCell.includes("C") ? startCell :
    (!startCell.includes("R") ? `R[${0 - rowIndex}]${startCell}` :
      (!startCell.includes("C") ? `${startCell}C[${(0 - columnIndex)}]` :
        startCell
      )
    );
  endCell = endCell.includes("R") && endCell.includes("C") ? endCell :
    (!endCell.includes("R") ? `R[${(lastRow - 1) - rowIndex}]${endCell}` :
      (!endCell.includes("C") ? `${endCell}C[${(lastColumn - 1) - columnIndex}]` :
        endCell
      )
    );
  const [, startRowIndex, startColumnIndex] = /R\[(-?\d+)\]C\[(-?\d+)\]/gmi.exec(startCell);
  const [, endRowIndex, endColumnIndex] = /R\[(-?\d+)\]C\[(-?\d+)\]/gmi.exec(endCell);
  const corrected = {
    startRowIndex: +startRowIndex + +rowIndex,
    startColumnIndex: +startColumnIndex + +columnIndex,
    endRowIndex: +endRowIndex + +rowIndex,
    endColumnIndex: +endColumnIndex + +columnIndex
  }
  for (let j = corrected.startColumnIndex; j <= corrected.endColumnIndex; j++) {
    for (let i = corrected.startRowIndex; i <= corrected.endRowIndex; i++) {
      let a1Notation = activeSheet.getRange(`R[${i}]C[${j}]`).getA1Notation();
      enumerated.push(isDifferentSheet ? `${rangeData[0]}!${a1Notation}` : a1Notation);
    }
  }
  const cell = activeSheet.getRange(`R[${rowIndex}]C[${columnIndex}]`).getA1Notation();
  return {
    cell: cell,
    precedents: [...new Set(enumerated)]
  };
}

function convertCellsFromR1C1toA1Notation(cellR1C1Reference, rowIndex, columnIndex) {
  let enumerated = [];
  const isDifferentSheet = cellR1C1Reference.includes("!");
  const cellReferenceData = isDifferentSheet ? cellR1C1Reference.split("!") : null;
  cellR1C1Reference = isDifferentSheet ? cellReferenceData[1] : cellR1C1Reference;

  const [, startRowIndex, startColumnIndex] = /R\[(-?\d+)\]C\[(-?\d+)\]/gmi.exec(cellR1C1Reference);
  const corrected = {
    startRowIndex: +startRowIndex + +rowIndex,
    startColumnIndex: +startColumnIndex + +columnIndex,
  }

  const a1Notation = ss.getRange(`R[${corrected.startRowIndex}]C[${corrected.startColumnIndex}]`).getA1Notation();
  enumerated.push(isDifferentSheet ? `${cellReferenceData[0]}!${a1Notation}` : a1Notation);
  const cell = activeSheet.getRange(`R[${rowIndex}]C[${columnIndex}]`).getA1Notation();
  return {
    cell: cell,
    precedents: [...new Set(enumerated)]
  };
}

Architecture

  • RegEx plays a big role in this solution because of how the sheet names are returned from the getFormulasR1C1 function (ex: \'Sheet - 1\'!A1) and because all the characters that's allowed to be used when naming a sheet too 😕
  • The enumerateRange function takes in the raw range data provided by the getFormulasR1C1 function along with the row and column index from which they get picked — this way, we can readjust the coordinates to reflect the actual cells we need

Bonus

I've also added the functionality to save the json output to a Drive file as some of these mappings could get too big to be displayed in the execution logs. Feel free to toggle the saveToDriveAsJson function accordingly 😇

What next

  • part 2 would mostly focus on also finding dependents of a cell
  • part 3 might accompany a way to visualize this data so its a lot more usable