python_libre_pythonista_ext icon indicating copy to clipboard operation
python_libre_pythonista_ext copied to clipboard

Ability to keep same cell value (cancel code calculation)

Open Vanerio opened this issue 10 months ago • 5 comments

Hi! I think having the possibility to cancel the calculation of the current cell value and keep the current value would be really useful in some cases. It is true that using an event or a button with a macro assigned should work, but LibrePythonista is more convenient and comfortable.

Let's take the following case as an example: You want to load a json file, but you don't want to reload it more than once. I tried to set a global variable and use sys.exit() to avoid reloading it again. But it doesn't work, because the global variable USERS_JSON_LOADED is not maintained between calculations.

import json
import sys

if not 'USERS_JSON_LOADED' in globals():
	_users_json_path = r"C:\Users.json"
	with open(_users_json_path, 'r') as json_file:
		_users_json = json.load(json_file)
		_users_df = pd.DataFrame(_users_json)
		globals()['USERS_JSON_LOADED'] = True
		lp_log.info('Users.json loaded!')
else:
	lp_log.info('Users.json already loaded!')
	sys.exit()

_users_df

For that reason I propose a method to avoid calculating the cell again, after first calculation. Maybe a keep_current_value() or something else.

The benefits would be:

  • Avoiding the reloading of files or other heavy calculations when not neccesary.
  • Preventing the redrawing of unchanged cell/range content.

Maybe I'm overlooking something that is already possible? Or maybe this is not technically possible?

Thank you very much!

Vanerio avatar Jan 30 '25 09:01 Vanerio

This is a tricky subject. I am working of refactoring the engine that handles all the python code for the sheets. I am not anticipating a lot of change to functionality.

Currently the values are recalculated with the sheet. This is what you are pointing out. I am not sure if adding the ability to skip cells during recalculation is possible at this point. There is a certain amount of caching taking place when parts of the sheet is being recalculated but how that works will likely change. Maybe after refactor ways accomplish this in a more simple manner may become clear.

Some Options you can currently use.

lru_cache is a good option for hashable values. A Pandas DataFrame is not hashable.

You can't directly use functools.lru_cache to cache Pandas DataFrames effectively. lru_cache is designed for caching the return values of functions, and it relies on object identity (using is) for its cache lookups. DataFrames, being mutable objects, can cause issues with this approach.

In the sheet code below you can see code for A1 and B1. Cell A1 contains the imports for the sheet and a fib() method to demonstrate lru_cache. Cell B1 calls the fib() function and displays the results as table data. Demonstrating how lru_cache can be used.

There are two other example of caching a Pandas DataFrame.

  1. A DataFrame can be pickled. Cell A3 contains a load_user_data() function. It uses PickleCache. Once the contents of the json are read in an loaded into a DataFrame the DataFrame is pickeled and saved into the system tmp/my_cache_dir directory. Each call after that load the DataFrame from the pickeled file in the tmp/my_cache_dir Directory. A hash of the file path is used as the key for the cache. So if the file path changes it will be a different cached object.

  2. Use a singleton class that implements LRUCache. Using APSO extension I embedded a module named quick_load into the document. You can see the contents below. In cell E3 I then used importer_user_script context manager to import the script from the document. See also uno_helper.importer. The load_user_data_ql_cache() method in E3 get the singleton cache instance ql_cache = quick_load.QuickCache() and uses it to cache and load the DataFrame.

Depending on caching solution and needs it is important to be mindful of the scope of the cache. When caching items specific to a document and or sheet it is a good idea to include the current document and or sheet in the key for the cache.

In each cell a global is available named CURRENT_CELL_OBJ. This is a CellObj instance. and can be used to get the current sheet name and or index. Lo can be used to get access to the current document and get its runtime_uid

Example of building a key that is unique to document and sheet.

doc = Lo.current_doc
key = f"{doc.runtime_uid}_{CURRENT_CELL_OBJ.sheet_name}_my_key"

Sheet Python code for example cell_cache.ods

# Start Dump
# Source code for doc: vnd.sun.star.tdoc:/2/
# Code for Cell: A1 of sheet index: 0
import json
import hashlib
from pathlib import Path
from functools import lru_cache
from ooodev.utils.cache.file_cache import PickleCache
from ooodev.uno_helper.importer import importer_doc_script

pk_cache = PickleCache(tmp_dir="my_cache_dir")

@lru_cache(maxsize=None)
def fib(n):
    if n < 2:
        return n
    return fib(n-1) + fib(n-2)
"init"



# Code for Cell: B1 of sheet index: 0
fibs = [[fib(n) for n in range(16)]]
lp_log.info(fibs)
lp_log.info(fib.cache_info())
fibs

# Code for Cell: A3 of sheet index: 0
def load_user_data(fnm: str):
    try:
        key = hashlib.md5(fnm.encode('utf-8')).hexdigest() + ".pkl"
        result = pk_cache.get(key)
        if result is not None:
            lp_log.info("load_user_data() Returning Dataframe from cache")
            return result
        lp_log.info("load_user_data Entered")
        lp_log.info("fnm: %s", fnm)
        with open(fnm, 'r') as json_file:
            _users_json = json.load(json_file)
            result = pd.DataFrame(_users_json)
            pk_cache.put(key, result)
            return result
    except Exception:
        lp_log.exception("Error")


_users_df = load_user_data(str(Path.home() / "Documents" / "tmp" / "Users.json"))
_users_df


# Code for Cell: E3 of sheet index: 0
with importer_doc_script():
    import quick_load



def load_user_data_ql_cache(fnm: str):
    try:
        key = hashlib.md5(fnm.encode('utf-8')).hexdigest()
        ql_cache = quick_load.QuickCache()
        result = ql_cache.get(key)
        if result is not None:
            lp_log.info("load_user_data_ql_cache() Returning Dataframe from cache")
            return result
        lp_log.info("load_user_data Entered")
        lp_log.info("fnm: %s", fnm)
        with open(fnm, 'r') as json_file:
            _users_json = json.load(json_file)
            result = pd.DataFrame(_users_json)
            ql_cache.put(key, result)
            return result
    except Exception:
        lp_log.exception("Error")


_users_df2 = load_user_data_ql_cache(str(Path.home() / "Documents" / "tmp" / "Users.json"))
_users_df2

# End Dump

Embeded quick_load module. See Class LRUCache

# coding: utf-8
from __future__ import unicode_literals
from ooodev.utils.cache import LRUCache

class QuickCache(LRUCache):
    _instance = None

    def __new__(cls, *args, **kwargs):
        if not cls._instance:
            cls._instance = super(QuickCache, cls).__new__(cls, *args, **kwargs)
        return cls._instance

    def __init__(self):
        if getattr(self, "_is_init", False):
            return
        LRUCache.__init__(self, 50)
        self._is_init = True

Test Json data

[
  {
    "name": "John Doe",
    "phone": "555-123-4567",
    "email": "[email protected]"
  },
  {
    "name": "Jane Smith",
    "phone": "555-987-6543",
    "email": "[email protected]"
  },
  {
    "name": "David Lee",
    "phone": "555-111-2222",
    "email": "[email protected]"
  },
  {
    "name": "Sarah Jones",
    "phone": "555-333-4444",
    "email": "[email protected]"
  },
  {
    "name": "Michael Brown",
    "phone": "555-555-6666",
    "email": "[email protected]"
  },
  {
    "name": "Emily Davis",
    "phone": "555-777-8888",
    "email": "[email protected]"
  },
  {
    "name": "Robert Wilson",
    "phone": "555-999-0000",
    "email": "[email protected]"
  },
  {
    "name": "Ashley Garcia",
    "phone": "555-222-3333",
    "email": "[email protected]"
  },
  {
    "name": "William Rodriguez",
    "phone": "555-444-5555",
    "email": "[email protected]"
  },
  {
    "name": "Elizabeth Martinez",
    "phone": "555-666-7777",
    "email": "[email protected]"
  }
]

Amourspirit avatar Jan 30 '25 18:01 Amourspirit

You did an amazing job. It works. It seems really overcomplicated to me right now though. It also has its downsides: Pickle Cache method assumes that .json file is never modified, so cache file should be manually removed in order to obtain file changes. And QuickCache requieres APSO workaround.

I think that, as cache content is returned (cell code is still being procceses), cells must be rewritten again. So one of the potential benefits I mentioned (not having to redraw) is not available right now (although it is fast).

I thought there could be an easier way to accomplish this task. However I totally understand that technically it may be impossible to make it easier.

For now, I think it is much much simpler to load files like in my example from an event or button (even if that is not as convenient as with LibrePythonista).

I hope the refactoring could make this simpler somehow (maybe not, and that is OK). Anyway I thank you very much for your help, your time and your examples, to which I will resort for now if necessary.

Vanerio avatar Jan 31 '25 15:01 Vanerio

Updated Caching Option. As of version 0.8.1 a FileChangeAwareCache can be used.

Here is the code involved in cell_cache2.ods

# Start Dump
# Source code for doc: vnd.sun.star.tdoc:/2/
# Code for Cell: A1 of sheet index: 0
import json
from typing import Any
from pathlib import Path
from ooodev.utils.cache.singleton import FileChangeAwareCache

def get_json_data(json_file: str | Path) -> Any:
    with open(json_file, 'r', encoding="utf-8") as json_file:
        data = json.load(json_file)
    return data

def get_cache_data(cache: FileChangeAwareCache, json_file: str | Path) -> Any:
    data = cache[json_file]
    if data is None:
        json_data = get_json_data(json_file)
        data = pd.DataFrame(json_data)
        cache[json_file] = data
    return data

cache = FileChangeAwareCache(tmp_dir="cell_cache2", key="cell_cache2")
json_file = str(Path.home() / "Documents" / "tmp" / "Users.json")

"init"

# Code for Cell: C1 of sheet index: 0
_users_df = get_cache_data(cache, json_file)

# End Dump

When the file changes that is cached in FileChangeAwareCache it will invalidate the cache.

There is a small bug in this example currently. After the json file is modified the sheet must be refreshed twice Shift+Ctrl+F9. This is a bug in LibrePythonista and is expected to be ironed out when refactoring is done.

Amourspirit avatar Feb 05 '25 20:02 Amourspirit

Updated Caching Option. As of version 0.8.1 a FileChangeAwareCache can be used.

Here is the code involved in cell_cache2.ods

Start Dump

Source code for doc: vnd.sun.star.tdoc:/2/

Code for Cell: A1 of sheet index: 0

import json from typing import Any from pathlib import Path from ooodev.utils.cache.singleton import FileChangeAwareCache

def get_json_data(json_file: str | Path) -> Any: with open(json_file, 'r', encoding="utf-8") as json_file: data = json.load(json_file) return data

def get_cache_data(cache: FileChangeAwareCache, json_file: str | Path) -> Any: data = cache[json_file] if data is None: json_data = get_json_data(json_file) data = pd.DataFrame(json_data) cache[json_file] = data return data

cache = FileChangeAwareCache(tmp_dir="cell_cache2", key="cell_cache2") json_file = str(Path.home() / "Documents" / "tmp" / "Users.json")

"init"

Code for Cell: C1 of sheet index: 0

_users_df = get_cache_data(cache, json_file)

End Dump

When the file changes that is cached in FileChangeAwareCache it will invalidate the cache.

There is a small bug in this example currently. After the json file is modified the sheet must be refreshed twice Shift+Ctrl+F9. This is a bug in LibrePythonista and is expected to be ironed out when refactoring is done.

Thank you for your amazing work!! I find of course a problem with that (for the ussage I have in mind, as written in my first post): If you edit the file with the document closed, changes in json file will never be noticed, and a manual cache removal must be performed. This is a bit inconvenient in terms of productivity and simplicity. If the json file is expected to be edited from the ods project then I suppose it will work fine, but not what I was trying to do and it is something to consider.

Apart from that: I can´t use the experimental editor in the latest LibreOffice release, it simply does not open. (LibrePythonista v0.9.0)

Thank you. I really appreciate your work!!

P.S.: sorry for the late response

Vanerio avatar Feb 11 '25 18:02 Vanerio

I was looking for something similar, but instead of a flag to stop recalculating, I would like to be able to simply replace the cell value.

I understand this will just delete the formula, I'm ok with that.

This may be a separate feature request, but if we can write to other cells instead of only read them, it would be another way to achieve a similar "caching" by just using another cell.

Godlander avatar Mar 22 '25 18:03 Godlander