Ability to keep same cell value (cancel code calculation)
Hi! I think having the possibility to cancel the calculation of the current cell value and keep the current value would be really useful in some cases. It is true that using an event or a button with a macro assigned should work, but LibrePythonista is more convenient and comfortable.
Let's take the following case as an example:
You want to load a json file, but you don't want to reload it more than once. I tried to set a global variable and use sys.exit() to avoid reloading it again. But it doesn't work, because the global variable USERS_JSON_LOADED is not maintained between calculations.
import json
import sys
if not 'USERS_JSON_LOADED' in globals():
_users_json_path = r"C:\Users.json"
with open(_users_json_path, 'r') as json_file:
_users_json = json.load(json_file)
_users_df = pd.DataFrame(_users_json)
globals()['USERS_JSON_LOADED'] = True
lp_log.info('Users.json loaded!')
else:
lp_log.info('Users.json already loaded!')
sys.exit()
_users_df
For that reason I propose a method to avoid calculating the cell again, after first calculation. Maybe a keep_current_value() or something else.
The benefits would be:
- Avoiding the reloading of files or other heavy calculations when not neccesary.
- Preventing the redrawing of unchanged cell/range content.
Maybe I'm overlooking something that is already possible? Or maybe this is not technically possible?
Thank you very much!
This is a tricky subject. I am working of refactoring the engine that handles all the python code for the sheets. I am not anticipating a lot of change to functionality.
Currently the values are recalculated with the sheet. This is what you are pointing out. I am not sure if adding the ability to skip cells during recalculation is possible at this point. There is a certain amount of caching taking place when parts of the sheet is being recalculated but how that works will likely change. Maybe after refactor ways accomplish this in a more simple manner may become clear.
Some Options you can currently use.
lru_cache is a good option for hashable values. A Pandas DataFrame is not hashable.
You can't directly use functools.lru_cache to cache Pandas DataFrames effectively.
lru_cache is designed for caching the return values of functions, and it relies on object identity (using is) for its cache lookups. DataFrames, being mutable objects, can cause issues with this approach.
In the sheet code below you can see code for A1 and B1.
Cell A1 contains the imports for the sheet and a fib() method to demonstrate lru_cache. Cell B1 calls the fib() function and displays the results as table data. Demonstrating how lru_cache can be used.
There are two other example of caching a Pandas DataFrame.
-
A DataFrame can be pickled. Cell
A3contains aload_user_data()function. It uses PickleCache. Once the contents of the json are read in an loaded into a DataFrame the DataFrame is pickeled and saved into the systemtmp/my_cache_dirdirectory. Each call after that load the DataFrame from the pickeled file in thetmp/my_cache_dirDirectory. A hash of the file path is used as the key for the cache. So if the file path changes it will be a different cached object. -
Use a singleton class that implements LRUCache. Using APSO extension I embedded a module named
quick_loadinto the document. You can see the contents below. In cellE3I then used importer_user_script context manager to import the script from the document. See also uno_helper.importer. Theload_user_data_ql_cache()method inE3get the singleton cache instanceql_cache = quick_load.QuickCache()and uses it to cache and load the DataFrame.
Depending on caching solution and needs it is important to be mindful of the scope of the cache. When caching items specific to a document and or sheet it is a good idea to include the current document and or sheet in the key for the cache.
In each cell a global is available named CURRENT_CELL_OBJ. This is a CellObj instance. and can be used to get the current sheet name and or index. Lo can be used to get access to the current document and get its runtime_uid
Example of building a key that is unique to document and sheet.
doc = Lo.current_doc
key = f"{doc.runtime_uid}_{CURRENT_CELL_OBJ.sheet_name}_my_key"
Sheet Python code for example cell_cache.ods
# Start Dump
# Source code for doc: vnd.sun.star.tdoc:/2/
# Code for Cell: A1 of sheet index: 0
import json
import hashlib
from pathlib import Path
from functools import lru_cache
from ooodev.utils.cache.file_cache import PickleCache
from ooodev.uno_helper.importer import importer_doc_script
pk_cache = PickleCache(tmp_dir="my_cache_dir")
@lru_cache(maxsize=None)
def fib(n):
if n < 2:
return n
return fib(n-1) + fib(n-2)
"init"
# Code for Cell: B1 of sheet index: 0
fibs = [[fib(n) for n in range(16)]]
lp_log.info(fibs)
lp_log.info(fib.cache_info())
fibs
# Code for Cell: A3 of sheet index: 0
def load_user_data(fnm: str):
try:
key = hashlib.md5(fnm.encode('utf-8')).hexdigest() + ".pkl"
result = pk_cache.get(key)
if result is not None:
lp_log.info("load_user_data() Returning Dataframe from cache")
return result
lp_log.info("load_user_data Entered")
lp_log.info("fnm: %s", fnm)
with open(fnm, 'r') as json_file:
_users_json = json.load(json_file)
result = pd.DataFrame(_users_json)
pk_cache.put(key, result)
return result
except Exception:
lp_log.exception("Error")
_users_df = load_user_data(str(Path.home() / "Documents" / "tmp" / "Users.json"))
_users_df
# Code for Cell: E3 of sheet index: 0
with importer_doc_script():
import quick_load
def load_user_data_ql_cache(fnm: str):
try:
key = hashlib.md5(fnm.encode('utf-8')).hexdigest()
ql_cache = quick_load.QuickCache()
result = ql_cache.get(key)
if result is not None:
lp_log.info("load_user_data_ql_cache() Returning Dataframe from cache")
return result
lp_log.info("load_user_data Entered")
lp_log.info("fnm: %s", fnm)
with open(fnm, 'r') as json_file:
_users_json = json.load(json_file)
result = pd.DataFrame(_users_json)
ql_cache.put(key, result)
return result
except Exception:
lp_log.exception("Error")
_users_df2 = load_user_data_ql_cache(str(Path.home() / "Documents" / "tmp" / "Users.json"))
_users_df2
# End Dump
Embeded quick_load module. See Class LRUCache
# coding: utf-8
from __future__ import unicode_literals
from ooodev.utils.cache import LRUCache
class QuickCache(LRUCache):
_instance = None
def __new__(cls, *args, **kwargs):
if not cls._instance:
cls._instance = super(QuickCache, cls).__new__(cls, *args, **kwargs)
return cls._instance
def __init__(self):
if getattr(self, "_is_init", False):
return
LRUCache.__init__(self, 50)
self._is_init = True
Test Json data
[
{
"name": "John Doe",
"phone": "555-123-4567",
"email": "[email protected]"
},
{
"name": "Jane Smith",
"phone": "555-987-6543",
"email": "[email protected]"
},
{
"name": "David Lee",
"phone": "555-111-2222",
"email": "[email protected]"
},
{
"name": "Sarah Jones",
"phone": "555-333-4444",
"email": "[email protected]"
},
{
"name": "Michael Brown",
"phone": "555-555-6666",
"email": "[email protected]"
},
{
"name": "Emily Davis",
"phone": "555-777-8888",
"email": "[email protected]"
},
{
"name": "Robert Wilson",
"phone": "555-999-0000",
"email": "[email protected]"
},
{
"name": "Ashley Garcia",
"phone": "555-222-3333",
"email": "[email protected]"
},
{
"name": "William Rodriguez",
"phone": "555-444-5555",
"email": "[email protected]"
},
{
"name": "Elizabeth Martinez",
"phone": "555-666-7777",
"email": "[email protected]"
}
]
You did an amazing job. It works. It seems really overcomplicated to me right now though. It also has its downsides: Pickle Cache method assumes that .json file is never modified, so cache file should be manually removed in order to obtain file changes. And QuickCache requieres APSO workaround.
I think that, as cache content is returned (cell code is still being procceses), cells must be rewritten again. So one of the potential benefits I mentioned (not having to redraw) is not available right now (although it is fast).
I thought there could be an easier way to accomplish this task. However I totally understand that technically it may be impossible to make it easier.
For now, I think it is much much simpler to load files like in my example from an event or button (even if that is not as convenient as with LibrePythonista).
I hope the refactoring could make this simpler somehow (maybe not, and that is OK). Anyway I thank you very much for your help, your time and your examples, to which I will resort for now if necessary.
Updated Caching Option.
As of version 0.8.1 a FileChangeAwareCache can be used.
Here is the code involved in cell_cache2.ods
# Start Dump
# Source code for doc: vnd.sun.star.tdoc:/2/
# Code for Cell: A1 of sheet index: 0
import json
from typing import Any
from pathlib import Path
from ooodev.utils.cache.singleton import FileChangeAwareCache
def get_json_data(json_file: str | Path) -> Any:
with open(json_file, 'r', encoding="utf-8") as json_file:
data = json.load(json_file)
return data
def get_cache_data(cache: FileChangeAwareCache, json_file: str | Path) -> Any:
data = cache[json_file]
if data is None:
json_data = get_json_data(json_file)
data = pd.DataFrame(json_data)
cache[json_file] = data
return data
cache = FileChangeAwareCache(tmp_dir="cell_cache2", key="cell_cache2")
json_file = str(Path.home() / "Documents" / "tmp" / "Users.json")
"init"
# Code for Cell: C1 of sheet index: 0
_users_df = get_cache_data(cache, json_file)
# End Dump
When the file changes that is cached in FileChangeAwareCache it will invalidate the cache.
There is a small bug in this example currently. After the json file is modified the sheet must be refreshed twice Shift+Ctrl+F9. This is a bug in LibrePythonista and is expected to be ironed out when refactoring is done.
Updated Caching Option. As of version
0.8.1a FileChangeAwareCache can be used.Here is the code involved in cell_cache2.ods
Start Dump
Source code for doc: vnd.sun.star.tdoc:/2/
Code for Cell: A1 of sheet index: 0
import json from typing import Any from pathlib import Path from ooodev.utils.cache.singleton import FileChangeAwareCache
def get_json_data(json_file: str | Path) -> Any: with open(json_file, 'r', encoding="utf-8") as json_file: data = json.load(json_file) return data
def get_cache_data(cache: FileChangeAwareCache, json_file: str | Path) -> Any: data = cache[json_file] if data is None: json_data = get_json_data(json_file) data = pd.DataFrame(json_data) cache[json_file] = data return data
cache = FileChangeAwareCache(tmp_dir="cell_cache2", key="cell_cache2") json_file = str(Path.home() / "Documents" / "tmp" / "Users.json")
"init"
Code for Cell: C1 of sheet index: 0
_users_df = get_cache_data(cache, json_file)
End Dump
When the file changes that is cached in FileChangeAwareCache it will invalidate the cache.
There is a small bug in this example currently. After the
jsonfile is modified the sheet must be refreshed twiceShift+Ctrl+F9. This is a bug in LibrePythonista and is expected to be ironed out when refactoring is done.
Thank you for your amazing work!! I find of course a problem with that (for the ussage I have in mind, as written in my first post): If you edit the file with the document closed, changes in json file will never be noticed, and a manual cache removal must be performed. This is a bit inconvenient in terms of productivity and simplicity. If the json file is expected to be edited from the ods project then I suppose it will work fine, but not what I was trying to do and it is something to consider.
Apart from that: I can´t use the experimental editor in the latest LibreOffice release, it simply does not open. (LibrePythonista v0.9.0)
Thank you. I really appreciate your work!!
P.S.: sorry for the late response
I was looking for something similar, but instead of a flag to stop recalculating, I would like to be able to simply replace the cell value.
I understand this will just delete the formula, I'm ok with that.
This may be a separate feature request, but if we can write to other cells instead of only read them, it would be another way to achieve a similar "caching" by just using another cell.