paga
paga copied to clipboard
zebrafish example bug
I tried to run the zebrafish notebook and received this error message
>>>var = pd.read_csv('./data/gene_names.txt', index_col=0, header=None)
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-8-3cb7686bdda8> in <module>
----> 1 var = pd.read_csv('./data/gene_names.txt', index_col=0, header=None)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, doublequote, delim_whitespace, low_memory, memory_map, float_precision)
676 skip_blank_lines=skip_blank_lines)
677
--> 678 return _read(filepath_or_buffer, kwds)
679
680 parser_f.__name__ = name
C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
438
439 # Create the parser.
--> 440 parser = TextFileReader(filepath_or_buffer, **kwds)
441
442 if chunksize or iterator:
C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in __init__(self, f, engine, **kwds)
785 self.options['has_index_names'] = kwds['has_index_names']
786
--> 787 self._make_engine(self.engine)
788
789 def close(self):
C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in _make_engine(self, engine)
1012 def _make_engine(self, engine='c'):
1013 if engine == 'c':
-> 1014 self._engine = CParserWrapper(self.f, **self.options)
1015 else:
1016 if engine == 'python':
C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in __init__(self, src, **kwds)
1706 kwds['usecols'] = self.usecols
1707
-> 1708 self._reader = parsers.TextReader(src, **kwds)
1709
1710 passed_names = self.names is None
pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()
pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()
FileNotFoundError: File b'./data/gene_names.txt' does not exist
Hi Altynai! The zebrafish notebook is the only one where there is no public data at this stage - all other notebooks can be executed. The data has been sent out by email from the lab. @calebweinreb, are you or dan (he is probably not on github, right?) planning to upload the files that you shared at the time somewhere? Would you mind if I upload them somewhere so that the notebook becomes executable?
Hi Alex,
As far as I know, you are free to upload the files and host them however you want want since the paper is published. Let me know if you need us to send the files again. I am pretty sure we have also uploaded them somewhere- maybe Dan can comment on that.
-- Caleb
On Thu, Jan 24, 2019 at 3:14 PM Alex Wolf [email protected] wrote:
Hi Altynai! The zebrafish notebook is the only one where there is no public data at this stage - all other notebooks can be executed. The data has been sent out by email from the lab. @calebweinreb https://github.com/calebweinreb, are you or dan (he is probably not on github, right?) planning to upload the files that you shared at the time somewhere? Would you mind if I upload them somewhere so that the notebook becomes executable?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/theislab/paga/issues/2#issuecomment-457340385, or mute the thread https://github.com/notifications/unsubscribe-auth/AF3OsnmFntcnbB9Qsy46CW1osqB4-wo3ks5vGhQ0gaJpZM4aQ2UU .
Hi Caleb & Alex, Yes, Caleb is correct. The files are publicly available in multiple formats from the locations linked below. I am not sure if a more specific format/version would be needed for seamless integration with Alex's PAGA notebook. Please let me know if I could help with that. I am also happy to upload any additional such files to our Kleintools web portal, which could then be easily linked to from your Github. Best, Dan
GEO: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE112294
Kleintools web portal: http://www.tinyurl.com/scZfish2018
Other Matlab and h5ad versions of the dataset: https://kleintools.hms.harvard.edu/paper_websites/wagner_zebrafish_timecourse2018/WagnerScience2018.mat https://kleintools.hms.harvard.edu/paper_websites/wagner_zebrafish_timecourse2018/WagnerScience2018.h5ad
On Thu, Jan 24, 2019 at 3:23 PM Caleb Weinreb [email protected] wrote:
Hi Alex,
As far as I know, you are free to upload the files and host them however you want want since the paper is published. Let me know if you need us to send the files again. I am pretty sure we have also uploaded them somewhere- maybe Dan can comment on that.
-- Caleb
On Thu, Jan 24, 2019 at 3:14 PM Alex Wolf [email protected] wrote:
Hi Altynai! The zebrafish notebook is the only one where there is no public data at this stage - all other notebooks can be executed. The data has been sent out by email from the lab. @calebweinreb https://github.com/calebweinreb, are you or dan (he is probably not on github, right?) planning to upload the files that you shared at the time somewhere? Would you mind if I upload them somewhere so that the notebook becomes executable?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/theislab/paga/issues/2#issuecomment-457340385, or mute the thread https://github.com/notifications/unsubscribe-auth/AF3OsnmFntcnbB9Qsy46CW1osqB4-wo3ks5vGhQ0gaJpZM4aQ2UU .
Ah, nice! h5ad even. This should make it trivial to include a line like
adata = sc.read('WagnerScience2018', backup_url='https://kleintools.hms.harvard.edu/paper_websites/wagner_zebrafish_timecourse2018/WagnerScience2018.h5ad')
right? No cache=True necessary, since it’s h5ad anyway, and using an extensionless key means it saves it into the sc.settings.writedir.
Awesome, @calebweinreb & Dan, thanks for the quick answer! And so cool that you uploaded an .h5ad version of the dataset! :smile:
As the uploaded h5ad doesn't contain the single-cell graph, however, the material is not enough to run the PAGA notebook here. What Caleb sent me around May consisted in these files:
The uploaded h5ad only contains the following

What misses is essentially the graph that underlies the figures in your paper and the SPRING-based exploration, which saved me (saves the people trying to reproduce the notebook) from doing all the preprocessing and graph computation. It be really cool if the graph was publicly available. For this, one could either upload a compressed version of the files (as shared via email at the time). Or one could upload an AnnData object that contains the graph. The only problem I see is that the current https://kleintools.hms.harvard.edu/paper_websites/wagner_zebrafish_timecourse2018/WagnerScience2018.h5ad seems to contain unfiltered data with 63530 cells, whereas the graph that Caleb sent out contains 53181 cells. I don't know whether you'd want me to send you this AnnData, which contains the graph. You could upload it as WagnerScience2018_processed.h5ad or something similar. And then I could polish the notebook and make something similar to this for the zebrafish. :wink:
If you don't want me to send you the AnnData containing the graph, I'm also happy to upload the data somewhere else. But the canonical location would be your web page, I guess. 🙂
Hello, I've been following the thread and was also interested in reproducing the PAGA notebook with the zebrafish dataset but still couldn't find any of the additional files needed to run the notebook. Any information on how to obtain them would be greatly appreciated. Thanks in advance.
How can I get the additional files that I can run the zebrafish notebook successfully?Could anyone help with this? Thanks in advance!
Sorry for the late response on this.
I uploaded the files that @calebweinreb sent out via email here: https://drive.google.com/file/d/1V2xA9P1nTaO9qWPj8LiAR-uQpqg0VMBv/view?usp=sharing
I'll also make a note on the notebook.
Here is the note on the notebook: https://github.com/theislab/paga/commit/b5dfcff9c87e2f71ef29e40060faca6c02117e89
Hi, I try to replicate zebrafish.ipynb, but encounter some errors.
- In chunk 8, I cannot write the
anndatato file.
- If I omit the error in Chunk 8, and run umap in chunk 9. There is still an error.
- In chunk 13 and 17, there are no
sc.utils.merge_groupsinscanpy.api.
Also, I try to directly use the Anndata object WagnerScience2018.h5ad on https://kleintools.hms.harvard.edu/paper_websites/wagner_zebrafish_timecourse2018/mainpage.html. But this data has a different number of cells with the object in Jupyter.
Could you please help me to figure this out?
Thanks
Im having the same issue as @kangbw702
Joining in late: 1 - I would very much appreciate if someone can re-upload the cell-cell connectivity file
2 - The difference in cell numbers is probably due to some filtering. In the original dataset there is the wildtype experiment (36749 cells) TraceSeq experiment (another ~30k cells), all from 24hpf.
to remove them we used the library ID:
adata = adata[adata.obs.library_id.str.startswith('DEW0'), :]
3 - Regarding the missing sc.utils.merge_groups function, after you get the mapping d, you can use:
adata.obs['cluster_coarse'] = adata.obs['clusters'].map(d)
(Note that on the WagnerScience2018.h5ad the clusters column is actually ClusterName, so you'll need to replace that in the tutorial)
@falexwolf
Sorry for the late response on this.
I uploaded the files that @calebweinreb sent out via email here: https://drive.google.com/file/d/1V2xA9P1nTaO9qWPj8LiAR-uQpqg0VMBv/view?usp=sharing
I'll also make a note on the notebook.
sorry, I'm coming too late so the link you have shared is unavailable. Can you share it again? Thanks!