cmapPy icon indicating copy to clipboard operation
cmapPy copied to clipboard

parse_gctx: don't sort returned values

Open dllahr opened this issue 6 years ago • 6 comments

Hi @oena @levlitichev

I was thinking about doing a pull request where I modified parse_gctx to not return the dataframes sorted by index/column. The reason I propose this is if you read them out and get them in the order that they appear in the file, you can then choose the ones you are interested in, figure out their index id, and then use the ridx/cidx option to load them, which is much faster.

Also, could make it an option to do the sort. What do you think?

dllahr avatar Mar 30 '18 22:03 dllahr

Hi @dllahr! Not sure I totally follow. Do you mean just for the metadata only options? Otherwise the IDs are subsetted before hyperslab selection occurs.

oena avatar Apr 02 '18 20:04 oena

Sorry, no I mean that right now when you get the metadata back (and I think when you get it all back) all of the ID's have been sorted. The use-case I ran into was:

  1. got just the row metadata back
  2. identified the overlap between the genes I wanted an those that were present
  3. identified the indices of the genes I wanted in the row metadata
  4. attempted to load using the ridx option, got a completely different set of genes
  5. realized that the row metadata had been sorted, rather than returned as it appears in the file

dllahr avatar Apr 09 '18 03:04 dllahr

Ok, gotcha. That does seem like a useful thing to do. Maybe we can start with having it as an option and see how things go?

oena avatar Apr 09 '18 16:04 oena

@oena Has this been taken up? I was thinking of working on this.

saksham219 avatar Jun 01 '19 12:06 saksham219

@dllahr did you follow up on this? No worries if not, just checking

oena avatar Jun 03 '19 14:06 oena

Sorry I'm late replying, I did not get to doing anything with this.

ghost avatar Jun 10 '19 18:06 ghost