cobratoolbox icon indicating copy to clipboard operation
cobratoolbox copied to clipboard

Adding src and test contents for IgemRNA functionality

Open BigDataInSilicoBiologyGroup opened this issue 2 years ago • 6 comments

We made several of the requested changes, however, we were not able to integrate the use of .csv data files for tests, since we make the use of multiple data sheets in our code and permit to load only one transcriptomics data file through the GUI (IgemRNA.m). Is there any other solution that would allow us to maintain the use of several data sheets?

Also we would be happy to work on the requested improvements described in pt 7. for further improvements, since it will be more time consuming.

Regarding pt 8, we had separated different thresholding approaches (GT1 - Global threshold 1, LT1 - Local threshold 1, LT2 - Local threshold) for determining gene activity levels into separate functions, but we could improve the code by joining them together and using optional parameters as part additional refinements.

We would be happy to revise and consider any future proposals regarding our contribution! Thank You for your time and insights!

I hereby confirm that I have:

  • [x] Tested my code on my own machine
  • [x] Followed the guidelines in the Contributing Guide
  • [ ] Selected develop as a target branch (top left drop-down menu)

(Note: You may replace [ ] with [X] to check the box)

@BigDataInSilicoBiologyGroup

ithiele avatar Aug 16 '22 22:08 ithiele

Why are there multiple copies of almost the same file? e.g. findGenesAboveThreshold* Please replace binary files with csv, or move binary files to COBRA.binary/tests/IgemRNA otherwise people will complain it takes too long to install the cobra toolbox. test/verifiedTests/analysis/testCompareExpression/testData_transcriptomics.xlsx No filenames should have spaces, otherwise linux users will complain. test/verifiedTests/dataIntegration/testMinimumRequirements/Results post-optimization/Minimum requirements/SRR8994359_WT.xls

rmtfleming avatar Sep 15 '22 22:09 rmtfleming

Hi, Dr R.M. Fleming!

Thank You, our team @BigDataInSilicoBiologyGroup https://github.com/BigDataInSilicoBiologyGroup appreciates the emphasis and clarification on the remaining weak spots and the detailed strategies of improving our cobratoolbox contribution! We have worked on combining the multiple similar thresholding and gene expression level determining scripts (e.g. findGenesAboveThreshold*) into one function with optional parameters returning both gene lists - those that have expression above and below global and/or local threshold/-s (based on the applied thresholding approach) which we will include in our next commit. Our team has also strategized and found a solution for replacing .xls data files for tests with .csv files by introducing two new helper functions - one for flattening multi-sheet .xls data into one table (flattenExcelSheetDataToTable.m) and the other for writing that data table into .csv. But we wanted to make sure if src\base\io\utilities would be an appropriate location for those new functions, and also we want to verify if the use of .mat model files for our tests is alright.

Thank you and best regards, Kristina Grausa

On Fri, Sep 16, 2022 at 1:20 AM Ronan M.T. Fleming @.***> wrote:

Why are there multiple copies of almost the same file? e.g. findGenesAboveThreshold* Please replace binary files with csv, or move binary files to COBRA.binary/tests/IgemRNA otherwise people will complain it takes too long to install the cobra toolbox.

test/verifiedTests/analysis/testCompareExpression/testData_transcriptomics.xlsx No filenames should have spaces, otherwise linux users will complain. test/verifiedTests/dataIntegration/testMinimumRequirements/Results post-optimization/Minimum requirements/SRR8994359_WT.xls

— Reply to this email directly, view it on GitHub https://github.com/opencobra/cobratoolbox/pull/1991#issuecomment-1248693584, or unsubscribe https://github.com/notifications/unsubscribe-auth/AU6JQ6FPJGKOIF4765DEVHLV6OOKXANCNFSM5WFKMIKQ . You are receiving this because you were mentioned.Message ID: @.***>

How big is the data that you are handling here? I get the impression that you have a lot of helper functions which read/write to folders that may (or may not, which can cause errors) exist, depending on where you call those functions. In general, Disk I/O is a relatively slow process, and if we are not talking about gigabytes of input data here it might be worth allowing users to pre-load data and only extract the information they want in your functions. E.g. you can put a lot of the data into a struct and access the fields of that struct instead of writing data to files. And if you want to allow checkpoints the same struct can have a field "Checkpointing" along with some indication on where you are in your code, in order to pick up again (if that's even necessary). I've seen code that took 100 times longer than necessary mainly due to excessive I/O operations. It also allows a much easier integration of the functionality into existing code if it is not necessary to first prepare lots of input files, and allows reusing functionality outside of your exact workflow.

tpfau avatar Sep 23 '22 05:09 tpfau

Dear Kristina, thanks a lot for your efforts to streamline your contribution so that there is less duplication and greater reuse of code. This will help to keep the maintenance costs manageable into the future. Thanks also for eliminating binary files (e.g. xls), this will help to keep the cobra toolbox manageable in size into the future. Any binary files that cannot be removed, please put them into COBRA.binary/analysis/IgemRNA Regards, Ronan

On Fri, 23 Sept 2022 at 05:27, Kristīna Grausa @.***> wrote:

Hi, Dr R.M. Fleming!

Thank You, our team @BigDataInSilicoBiologyGroup https://github.com/BigDataInSilicoBiologyGroup appreciates the emphasis and clarification on the remaining weak spots and the detailed strategies of improving our cobratoolbox contribution! We have worked on combining the multiple similar thresholding and gene expression level determining scripts (e.g. findGenesAboveThreshold*) into one function with optional parameters returning both gene lists - those that have expression above and below global and/or local threshold/-s (based on the applied thresholding approach) which we will include in our next commit. Our team has also strategized and found a solution for replacing .xls data files for tests with .csv files by introducing two new helper functions - one for flattening multi-sheet .xls data into one table (flattenExcelSheetDataToTable.m) and the other for writing that data table into .csv. But we wanted to make sure if src\base\io\utilities would be an appropriate location for those new functions, and also we want to verify if the use of .mat model files for our tests is alright.

Thank you and best regards, Kristina Grausa

On Fri, Sep 16, 2022 at 1:20 AM Ronan M.T. Fleming < @.***> wrote:

Why are there multiple copies of almost the same file? e.g. findGenesAboveThreshold* Please replace binary files with csv, or move binary files to COBRA.binary/tests/IgemRNA otherwise people will complain it takes too long to install the cobra toolbox.

test/verifiedTests/analysis/testCompareExpression/testData_transcriptomics.xlsx No filenames should have spaces, otherwise linux users will complain. test/verifiedTests/dataIntegration/testMinimumRequirements/Results post-optimization/Minimum requirements/SRR8994359_WT.xls

— Reply to this email directly, view it on GitHub https://github.com/opencobra/cobratoolbox/pull/1991#issuecomment-1248693584, or unsubscribe https://github.com/notifications/unsubscribe-auth/AU6JQ6FPJGKOIF4765DEVHLV6OOKXANCNFSM5WFKMIKQ . You are receiving this because you were mentioned.Message ID: @.***>

--

Mr. Ronan MT Fleming B.V.M.S. Dip. Math. Ph.D.

Associate Professor, School of Medicine, National University of Ireland, Galway. & Assistant Professor, Division of Systems Biomedicine and Pharmacology, Leiden Academic Centre for Drug Research, Faculty of Science, Leiden University. https://www.universiteitleiden.nl/en/staffmembers/ronan-fleming

Peer-reviewed publications: https://goo.gl/FZPG23 Mobile: +353 852 109 806 Skype: ronan.fleming

(This message is confidential and may contain privileged information. It is intended for the named recipient only. If you receive it in error please notify me and permanently delete the original message and any copies.)

So can I use .mat model files for testing purposes? I will revise the most recent changes and put them in order for a commit, making sure there aren't any binary files left as soon as I return from a business trip.

Best regards, Kristina

On Fri, Sep 23, 2022 at 11:37 AM Ronan M.T. Fleming < @.***> wrote:

Dear Kristina, thanks a lot for your efforts to streamline your contribution so that there is less duplication and greater reuse of code. This will help to keep the maintenance costs manageable into the future. Thanks also for eliminating binary files (e.g. xls), this will help to keep the cobra toolbox manageable in size into the future. Any binary files that cannot be removed, please put them into COBRA.binary/analysis/IgemRNA Regards, Ronan

On Fri, 23 Sept 2022 at 05:27, Kristīna Grausa @.***> wrote:

Hi, Dr R.M. Fleming!

Thank You, our team @BigDataInSilicoBiologyGroup https://github.com/BigDataInSilicoBiologyGroup appreciates the emphasis and clarification on the remaining weak spots and the detailed strategies of improving our cobratoolbox contribution! We have worked on combining the multiple similar thresholding and gene expression level determining scripts (e.g. findGenesAboveThreshold*) into one function with optional parameters returning both gene lists - those that have expression above and below global and/or local threshold/-s (based on the applied thresholding approach) which we will include in our next commit. Our team has also strategized and found a solution for replacing .xls data files for tests with .csv files by introducing two new helper functions - one for flattening multi-sheet .xls data into one table (flattenExcelSheetDataToTable.m) and the other for writing that data table into .csv. But we wanted to make sure if src\base\io\utilities would be an appropriate location for those new functions, and also we want to verify if the use of .mat model files for our tests is alright.

Thank you and best regards, Kristina Grausa

On Fri, Sep 16, 2022 at 1:20 AM Ronan M.T. Fleming < @.***> wrote:

Why are there multiple copies of almost the same file? e.g. findGenesAboveThreshold* Please replace binary files with csv, or move binary files to COBRA.binary/tests/IgemRNA otherwise people will complain it takes too long to install the cobra toolbox.

test/verifiedTests/analysis/testCompareExpression/testData_transcriptomics.xlsx No filenames should have spaces, otherwise linux users will complain. test/verifiedTests/dataIntegration/testMinimumRequirements/Results post-optimization/Minimum requirements/SRR8994359_WT.xls

— Reply to this email directly, view it on GitHub https://github.com/opencobra/cobratoolbox/pull/1991#issuecomment-1248693584, or unsubscribe https://github.com/notifications/unsubscribe-auth/AU6JQ6FPJGKOIF4765DEVHLV6OOKXANCNFSM5WFKMIKQ . You are receiving this because you were mentioned.Message ID: @.***>

--

Mr. Ronan MT Fleming B.V.M.S. Dip. Math. Ph.D.


Associate Professor, School of Medicine, National University of Ireland, Galway. & Assistant Professor, Division of Systems Biomedicine and Pharmacology, Leiden Academic Centre for Drug Research, Faculty of Science, Leiden University. https://www.universiteitleiden.nl/en/staffmembers/ronan-fleming


Peer-reviewed publications: https://goo.gl/FZPG23 Mobile: +353 852 109 806 Skype: ronan.fleming


(This message is confidential and may contain privileged information. It is intended for the named recipient only. If you receive it in error please notify me and permanently delete the original message and any copies.)