litstudy icon indicating copy to clipboard operation
litstudy copied to clipboard

Manipulate and save a DocumentSet object after loading.

Open rjavierch opened this issue 1 year ago • 4 comments

Hello, I am wondering how possible is to manipulate (like in a pandas table) and save a loaded DocumentSet such as .bib, ieee_csv. Or also manipulate and save the data after doing a refinement (for example using refine_scupos).

Thank you!

rjavierch avatar Sep 11 '23 05:09 rjavierch

Hi

I am wondering how possible is to manipulate (like in a pandas table)

Manipulating the documents themselves is not possible. You can, however, manipulate a DocumentSet which contains a list of documents by, for example, calculating the intersection, union, or differen between sets (see DocumentSet)

and save a loaded DocumentSet such as .bib, ieee_csv.

Saving a document set is not possible, but it is a highly requested feature. There are open issues for saving a document set as a Bibtex file or RIS file:

  • #12
  • #13

Of you interesting in looking into these, we welcome all relevant pull requests!

stijnh avatar Sep 12 '23 11:09 stijnh

I was looking for this as well. A possible workaround might be to just Export the documentset to a csv and later Reimport it if needed. Or is there any other way to not lose my progress everytime I shut down my machine? I mean, there must be a database saved somewhere, or is all this data sitting in the memory?

FlashFFF avatar Oct 19 '23 14:10 FlashFFF

Which fields are called from the api upon refine? Is it all the ones from the class litstudy.types.Document?

FlashFFF avatar Oct 19 '23 14:10 FlashFFF

I was looking for this as well. A possible workaround might be to just Export the documentset to a csv and later Reimport it if needed. Or is there any other way to not lose my progress everytime I shut down my machine? I mean, there must be a database saved somewhere, or is all this data sitting in the memory?

Alternatively you could pickle the document set which takes less space than a csv. After that you can reload it whenever you would like to perform further analysis on the set. Just use these code snippets:

to save: with open("data.pickle", "wb") as f: pickle.dump(data, f, protocol=pickle.HIGHEST_PROTOCOL) to load: with open("data.pickle", "rb") as f: data = pickle.load(f)

okoknik avatar Oct 26 '23 09:10 okoknik