paga icon indicating copy to clipboard operation
paga copied to clipboard

zebrafish example bug

Open AltynaiA opened this issue 6 years ago • 14 comments

I tried to run the zebrafish notebook and received this error message

>>>var = pd.read_csv('./data/gene_names.txt', index_col=0, header=None)
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-8-3cb7686bdda8> in <module>
----> 1 var = pd.read_csv('./data/gene_names.txt', index_col=0, header=None)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, doublequote, delim_whitespace, low_memory, memory_map, float_precision)
    676                     skip_blank_lines=skip_blank_lines)
    677 
--> 678         return _read(filepath_or_buffer, kwds)
    679 
    680     parser_f.__name__ = name

C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
    438 
    439     # Create the parser.
--> 440     parser = TextFileReader(filepath_or_buffer, **kwds)
    441 
    442     if chunksize or iterator:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in __init__(self, f, engine, **kwds)
    785             self.options['has_index_names'] = kwds['has_index_names']
    786 
--> 787         self._make_engine(self.engine)
    788 
    789     def close(self):

C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in _make_engine(self, engine)
   1012     def _make_engine(self, engine='c'):
   1013         if engine == 'c':
-> 1014             self._engine = CParserWrapper(self.f, **self.options)
   1015         else:
   1016             if engine == 'python':

C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in __init__(self, src, **kwds)
   1706         kwds['usecols'] = self.usecols
   1707 
-> 1708         self._reader = parsers.TextReader(src, **kwds)
   1709 
   1710         passed_names = self.names is None

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

FileNotFoundError: File b'./data/gene_names.txt' does not exist

AltynaiA avatar Jan 24 '19 13:01 AltynaiA

Hi Altynai! The zebrafish notebook is the only one where there is no public data at this stage - all other notebooks can be executed. The data has been sent out by email from the lab. @calebweinreb, are you or dan (he is probably not on github, right?) planning to upload the files that you shared at the time somewhere? Would you mind if I upload them somewhere so that the notebook becomes executable?

falexwolf avatar Jan 24 '19 20:01 falexwolf

Hi Alex,

As far as I know, you are free to upload the files and host them however you want want since the paper is published. Let me know if you need us to send the files again. I am pretty sure we have also uploaded them somewhere- maybe Dan can comment on that.

-- Caleb

On Thu, Jan 24, 2019 at 3:14 PM Alex Wolf [email protected] wrote:

Hi Altynai! The zebrafish notebook is the only one where there is no public data at this stage - all other notebooks can be executed. The data has been sent out by email from the lab. @calebweinreb https://github.com/calebweinreb, are you or dan (he is probably not on github, right?) planning to upload the files that you shared at the time somewhere? Would you mind if I upload them somewhere so that the notebook becomes executable?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/theislab/paga/issues/2#issuecomment-457340385, or mute the thread https://github.com/notifications/unsubscribe-auth/AF3OsnmFntcnbB9Qsy46CW1osqB4-wo3ks5vGhQ0gaJpZM4aQ2UU .

calebweinreb avatar Jan 24 '19 20:01 calebweinreb

Hi Caleb & Alex, Yes, Caleb is correct. The files are publicly available in multiple formats from the locations linked below. I am not sure if a more specific format/version would be needed for seamless integration with Alex's PAGA notebook. Please let me know if I could help with that. I am also happy to upload any additional such files to our Kleintools web portal, which could then be easily linked to from your Github. Best, Dan

GEO: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE112294

Kleintools web portal: http://www.tinyurl.com/scZfish2018

Other Matlab and h5ad versions of the dataset: https://kleintools.hms.harvard.edu/paper_websites/wagner_zebrafish_timecourse2018/WagnerScience2018.mat https://kleintools.hms.harvard.edu/paper_websites/wagner_zebrafish_timecourse2018/WagnerScience2018.h5ad

On Thu, Jan 24, 2019 at 3:23 PM Caleb Weinreb [email protected] wrote:

Hi Alex,

As far as I know, you are free to upload the files and host them however you want want since the paper is published. Let me know if you need us to send the files again. I am pretty sure we have also uploaded them somewhere- maybe Dan can comment on that.

-- Caleb

On Thu, Jan 24, 2019 at 3:14 PM Alex Wolf [email protected] wrote:

Hi Altynai! The zebrafish notebook is the only one where there is no public data at this stage - all other notebooks can be executed. The data has been sent out by email from the lab. @calebweinreb https://github.com/calebweinreb, are you or dan (he is probably not on github, right?) planning to upload the files that you shared at the time somewhere? Would you mind if I upload them somewhere so that the notebook becomes executable?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/theislab/paga/issues/2#issuecomment-457340385, or mute the thread https://github.com/notifications/unsubscribe-auth/AF3OsnmFntcnbB9Qsy46CW1osqB4-wo3ks5vGhQ0gaJpZM4aQ2UU .

calebweinreb avatar Jan 24 '19 20:01 calebweinreb

Ah, nice! h5ad even. This should make it trivial to include a line like

adata = sc.read('WagnerScience2018', backup_url='https://kleintools.hms.harvard.edu/paper_websites/wagner_zebrafish_timecourse2018/WagnerScience2018.h5ad')

right? No cache=True necessary, since it’s h5ad anyway, and using an extensionless key means it saves it into the sc.settings.writedir.

flying-sheep avatar Jan 25 '19 08:01 flying-sheep

Awesome, @calebweinreb & Dan, thanks for the quick answer! And so cool that you uploaded an .h5ad version of the dataset! :smile:

As the uploaded h5ad doesn't contain the single-cell graph, however, the material is not enough to run the PAGA notebook here. What Caleb sent me around May consisted in these files: image The uploaded h5ad only contains the following image

What misses is essentially the graph that underlies the figures in your paper and the SPRING-based exploration, which saved me (saves the people trying to reproduce the notebook) from doing all the preprocessing and graph computation. It be really cool if the graph was publicly available. For this, one could either upload a compressed version of the files (as shared via email at the time). Or one could upload an AnnData object that contains the graph. The only problem I see is that the current https://kleintools.hms.harvard.edu/paper_websites/wagner_zebrafish_timecourse2018/WagnerScience2018.h5ad seems to contain unfiltered data with 63530 cells, whereas the graph that Caleb sent out contains 53181 cells. I don't know whether you'd want me to send you this AnnData, which contains the graph. You could upload it as WagnerScience2018_processed.h5ad or something similar. And then I could polish the notebook and make something similar to this for the zebrafish. :wink:

falexwolf avatar Jan 25 '19 11:01 falexwolf

If you don't want me to send you the AnnData containing the graph, I'm also happy to upload the data somewhere else. But the canonical location would be your web page, I guess. 🙂

falexwolf avatar Jan 25 '19 11:01 falexwolf

Hello, I've been following the thread and was also interested in reproducing the PAGA notebook with the zebrafish dataset but still couldn't find any of the additional files needed to run the notebook. Any information on how to obtain them would be greatly appreciated. Thanks in advance.

psahai10 avatar Mar 15 '19 21:03 psahai10

How can I get the additional files that I can run the zebrafish notebook successfully?Could anyone help with this? Thanks in advance!

ChengTao2017 avatar Mar 18 '19 06:03 ChengTao2017

Sorry for the late response on this.

I uploaded the files that @calebweinreb sent out via email here: https://drive.google.com/file/d/1V2xA9P1nTaO9qWPj8LiAR-uQpqg0VMBv/view?usp=sharing

I'll also make a note on the notebook.

falexwolf avatar Mar 19 '19 10:03 falexwolf

Here is the note on the notebook: https://github.com/theislab/paga/commit/b5dfcff9c87e2f71ef29e40060faca6c02117e89

falexwolf avatar Mar 19 '19 10:03 falexwolf

Hi, I try to replicate zebrafish.ipynb, but encounter some errors.

  1. In chunk 8, I cannot write the anndata to file.
Screen Shot 2020-04-26 at 9 01 17 PM
  1. If I omit the error in Chunk 8, and run umap in chunk 9. There is still an error.
Screen Shot 2020-04-26 at 9 02 36 PM
  1. In chunk 13 and 17, there are no sc.utils.merge_groups in scanpy.api.

Also, I try to directly use the Anndata object WagnerScience2018.h5ad on https://kleintools.hms.harvard.edu/paper_websites/wagner_zebrafish_timecourse2018/mainpage.html. But this data has a different number of cells with the object in Jupyter.

Could you please help me to figure this out?

Thanks

kangbw702 avatar Apr 27 '20 01:04 kangbw702

Im having the same issue as @kangbw702

davidfstein avatar Jun 03 '20 19:06 davidfstein

Joining in late: 1 - I would very much appreciate if someone can re-upload the cell-cell connectivity file

2 - The difference in cell numbers is probably due to some filtering. In the original dataset there is the wildtype experiment (36749 cells) TraceSeq experiment (another ~30k cells), all from 24hpf. to remove them we used the library ID: adata = adata[adata.obs.library_id.str.startswith('DEW0'), :]

3 - Regarding the missing sc.utils.merge_groups function, after you get the mapping d, you can use: adata.obs['cluster_coarse'] = adata.obs['clusters'].map(d) (Note that on the WagnerScience2018.h5ad the clusters column is actually ClusterName, so you'll need to replace that in the tutorial)

yotamcons avatar Jan 03 '23 15:01 yotamcons

@falexwolf

Sorry for the late response on this.

I uploaded the files that @calebweinreb sent out via email here: https://drive.google.com/file/d/1V2xA9P1nTaO9qWPj8LiAR-uQpqg0VMBv/view?usp=sharing

I'll also make a note on the notebook.

sorry, I'm coming too late so the link you have shared is unavailable. Can you share it again? Thanks!

ZoeyYDKY avatar Feb 06 '24 09:02 ZoeyYDKY