cohorts icon indicating copy to clipboard operation
cohorts copied to clipboard

Invalid biotype error when loading effects from TCGA-BLCA cohort

Open jburos opened this issue 8 years ago • 8 comments

I get the following error when trying to load variant effects from the TCGA-BLCA cohort

E.g, if I were to use missense_snv_count instead of snv_count in this notebook

This looks to me like a varcode issue, but noting it here since it does impact cohorts usability.

Here is the full traceback:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-24-4d7b83c3ab2e> in <module>()
----> 1 extra_cols, df = blca_cohort2.as_dataframe(on=[cohorts.functions.snv_count, missense_snv_count])
      2 print(extra_cols)

/Users/jacquelineburos/anaconda3/envs/python27/lib/python2.7/site-packages/cohorts/cohort.pyc in as_dataframe(self, on, col, join_with, join_how, rename_cols, keep_paren_contents, **kwargs)
    335             for i, elem in enumerate(on):
    336                 col = elem.__name__ if not is_lambda(elem) else "column_%d" % i
--> 337                 col, df = apply_func(on=elem, col=col, df=df)
    338                 cols.append(col)
    339 

/Users/jacquelineburos/anaconda3/envs/python27/lib/python2.7/site-packages/cohorts/cohort.pyc in apply_func(on, col, df)
    311             else:
    312                 func = lambda row: on(row=row, cohort=self, **kwargs)
--> 313             df[col] = df.apply(func, axis=1)
    314             return (col, df)
    315 

/Users/jacquelineburos/anaconda3/envs/python27/lib/python2.7/site-packages/pandas/core/frame.pyc in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
   4059                     if reduce is None:
   4060                         reduce = True
-> 4061                     return self._apply_standard(f, axis, reduce=reduce)
   4062             else:
   4063                 return self._apply_broadcast(f, axis)

/Users/jacquelineburos/anaconda3/envs/python27/lib/python2.7/site-packages/pandas/core/frame.pyc in _apply_standard(self, func, axis, ignore_failures, reduce)
   4155             try:
   4156                 for i, v in enumerate(series_gen):
-> 4157                     results[i] = func(v)
   4158                     keys.append(v.name)
   4159             except Exception as e:

/Users/jacquelineburos/anaconda3/envs/python27/lib/python2.7/site-packages/cohorts/cohort.pyc in <lambda>(row)
    310                 func = lambda row: on(row=row, **kwargs)
    311             else:
--> 312                 func = lambda row: on(row=row, cohort=self, **kwargs)
    313             df[col] = df.apply(func, axis=1)
    314             return (col, df)

/Users/jacquelineburos/anaconda3/envs/python27/lib/python2.7/site-packages/cohorts/functions.pyc in missense_snv_count(row, cohort, filter_fn, normalized_per_mb, **kwargs)
     70         patients=[cohort.patient_from_id(patient_id)],
     71         filter_fn=missense_filter_fn,
---> 72         **kwargs)
     73     if patient_id in patient_missense_effects:
     74         count = len(patient_missense_effects[patient_id])

/Users/jacquelineburos/anaconda3/envs/python27/lib/python2.7/site-packages/cohorts/cohort.pyc in load_effects(self, patients, only_nonsynonymous, filter_fn)
    615         for patient in self.iter_patients(patients):
    616             effects = self._load_single_patient_effects(
--> 617                 patient, only_nonsynonymous, filter_fn)
    618             if effects is not None:
    619                 patient_effects[patient.id] = effects

/Users/jacquelineburos/anaconda3/envs/python27/lib/python2.7/site-packages/cohorts/cohort.pyc in _load_single_patient_effects(self, patient, only_nonsynonymous, filter_fn)
    639                                   filter_fn=filter_fn)
    640 
--> 641         effects = variants.effects()
    642 
    643         # Always take the top priority effect per variant so we end up with a single

/Users/jacquelineburos/anaconda3/envs/python27/lib/python2.7/site-packages/varcode/variant_collection.pyc in effects(self, raise_on_error)
     73             effect
     74             for variant in self
---> 75             for effect in variant.effects(raise_on_error=raise_on_error)
     76         ])
     77 

/Users/jacquelineburos/anaconda3/envs/python27/lib/python2.7/site-packages/varcode/variant.pyc in effects(self, raise_on_error)
    405         # group transcripts by their gene ID
    406         transcripts_grouped_by_gene = groupby_field(
--> 407             self.transcripts,
    408             'gene_id')
    409 

/Users/jacquelineburos/anaconda3/envs/python27/lib/python2.7/site-packages/memoized_property.pyc in fget_memoized(self)
     38     def fget_memoized(self):
     39         if not hasattr(self, attr_name):
---> 40             setattr(self, attr_name, fget(self))
     41         return getattr(self, attr_name)
     42 

/Users/jacquelineburos/anaconda3/envs/python27/lib/python2.7/site-packages/varcode/variant.pyc in transcripts(self)
    324     def transcripts(self):
    325         return self.ensembl.transcripts_at_locus(
--> 326             self.contig, self.start, self.end)
    327 
    328     @property

/Users/jacquelineburos/anaconda3/envs/python27/lib/python2.7/site-packages/pyensembl/genome.pyc in transcripts_at_locus(self, contig, position, end, strand)
    464         return [
    465             self.transcript_by_id(transcript_id)
--> 466             for transcript_id in transcript_ids
    467         ]
    468 

/Users/jacquelineburos/anaconda3/envs/python27/lib/python2.7/site-packages/pyensembl/genome.pyc in transcript_by_id(self, transcript_id)
    830                 gene_id=gene_id,
    831                 genome=self,
--> 832                 require_valid_biotype=("transcript_biotype" in field_names))
    833 
    834         return self._transcripts[transcript_id]

/Users/jacquelineburos/anaconda3/envs/python27/lib/python2.7/site-packages/pyensembl/transcript.pyc in __init__(self, transcript_id, transcript_name, contig, start, end, strand, biotype, gene_id, genome, require_valid_biotype)
     51             raise ValueError(
     52                 "Invalid biotype '%s' for transcript with ID=%s, name=%s" % (
---> 53                     biotype, transcript_id, transcript_name))
     54 
     55     def __str__(self):

ValueError: (u"Invalid biotype '3prime_overlapping_ncRNA' for transcript with ID=ENST00000505579, name=AC009487.5-001", u'occurred at index 0')

jburos avatar Sep 27 '16 16:09 jburos

I've seen this as well with ensembl85 in varcode. @iskandr mentioned in the latest pyensembl, but we need to upgrade to the new varcode and pyensembl interfaces in cohorts. I've done that in #143 so hopefully this will go away after that.

arahuja avatar Sep 27 '16 16:09 arahuja

@jburos is this fixed now?

tavinathanson avatar Sep 28 '16 20:09 tavinathanson

It's been running - not confirmed yet

jburos avatar Sep 28 '16 20:09 jburos

Here's truncated output, having removed the cache:

/Users/jacquelineburos/anaconda3/envs/python27/lib/python2.7/site-packages/serializable/helpers.pyc in to_dict(obj)
    188     raise ValueError(
    189         "Cannot convert %s : %s to dictionary" % (
--> 190             obj, type(obj)))
    191 
    192 @return_primitive

ValueError: ("Cannot convert SMFEPDLDHIDDGPL : <class 'Bio.Seq.Seq'> to dictionary", u'occurred at index 0')

Looks like a new bug to me. Should we mark this as closed?

jburos avatar Sep 28 '16 20:09 jburos

I think @tavinathanson fixed this in Varcode.

iskandr avatar Sep 28 '16 21:09 iskandr

@jburos yup, should be fixed with the latest varcode.

tavinathanson avatar Sep 28 '16 21:09 tavinathanson

thx. I updated varcode & now I see the following error --

(posting a screenshot since the output is too large & crashes my computer if I try to select it):

screen shot 2016-09-28 at 11 45 20 pm

I'm running this again & piping output to a text file, but wanted to note this here in the meantime.

.. that being said, LMK if I should close & create a new issue elsewhere. thanks for your help!

jburos avatar Sep 29 '16 03:09 jburos

@jburos that .pyc code looks old; can you delete all the .pycs and see if that helps? Are you using 0.4.0 there?

tavinathanson avatar Sep 29 '16 13:09 tavinathanson