coco-caption icon indicating copy to clipboard operation
coco-caption copied to clipboard

Python-3.x support

Open kmario23 opened this issue 7 years ago • 14 comments

Hi all, I'd like to know whether you have plans to port the codebase to Python-3. Since most of the people have switched to Python-3, it'd be nice to have Python-3 support so that other projects (for e.g. ImageCaptioning PyTorch ) dependent on coco-caption can also be implemented in Python-3.

Thanks!

kmario23 avatar Jan 19 '18 19:01 kmario23

I have ported it to python3 version, but meteor metrix doesn't work. You can have a see. coco-caption

xiadingZ avatar Jan 20 '18 01:01 xiadingZ

I have implemented Python 3 support for the evaluation metrics. Have a look at my comment here: https://github.com/ruotianluo/ImageCaptioning.pytorch/issues/36#issuecomment-363442083

I am using my version of the eval tools together with the pycocotools from here: https://github.com/cocodataset/cocoapi

salaniz avatar Feb 06 '18 14:02 salaniz

I have created a fork that is both Python 3 compatible and that uses the new Word Mover's Distance metric. It would be nice to merge with this repository.

https://github.com/mtanti/coco-caption

mtanti avatar May 27 '18 08:05 mtanti

I just modified the code to support Python 3, with support for Chinese. https://github.com/entalent/coco-caption-py3/blob/master/README.md It was created in a hurry...so there might be bugs.

entalent avatar Aug 29 '18 02:08 entalent

What's the status on this? :)

rubencart avatar Feb 19 '19 13:02 rubencart

@rubencart They said "We are currently focusing on more of the object detection / segmentation challenges, and have decided to leave the captioning leaderboard open but not make additional updates to it."

mtanti avatar Feb 19 '19 19:02 mtanti

Another pure Python 3.x fork with no support for Python 2 with some tiny bugs fixed as well --> https://github.com/ozancaglayan/coco-caption

ozancaglayan avatar Aug 01 '19 13:08 ozancaglayan

Thanks for your contribution. Based on @mtanti 's implementation, I modified two places to support meteor evalution for both py2 and py3.

  1. It seems that the code of
        score_line = ' ||| '.join(('SCORE', ' ||| '.join(reference_list), hypothesis_str))
        self.meteor_p.stdin.write(score_line+'\n')

cannot support py2 and I changed it to

        if sys.version_info[0] == 2:  # python2
            score_line = ' ||| '.join(('SCORE', ' ||| '.join(reference_list), hypothesis_str)).encode('utf-8').strip()
            self.meteor_p.stdin.write(str(score_line+b'\n'))
        else:  # assume python3+
            score_line = ' ||| '.join(('SCORE', ' ||| '.join(reference_list), hypothesis_str)).strip()
            self.meteor_p.stdin.write(score_line+'\n')
  1. Add a judgement in compute_score
            # There's a situation that the prediction is all punctuations
            # (see definition of PUNCTUATIONS in pycocoevalcap/tokenizer/ptbtokenizer.py)
            # then the prediction will become [''] after tokenization
            # which means res[i][0] == '' and self._stat will failed with this input
            if len(res[i][0]) == 0:
                res[i][0] = 'a'

The complete code of meteor.py is as following

#!/usr/bin/env python

# Python wrapper for METEOR implementation, by Xinlei Chen
# Acknowledge Michael Denkowski for the generous discussion and help 
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os
import sys
import subprocess
import threading

# Assumes meteor-1.5.jar is in the same directory as meteor.py.  Change as needed.
METEOR_JAR = 'meteor-1.5.jar'
# print METEOR_JAR

class Meteor:

    def __init__(self):
        self.env = os.environ
        self.env['LC_ALL'] = 'en_US.UTF_8'
        self.meteor_cmd = ['java', '-jar', '-Xmx2G', METEOR_JAR,
                '-', '-', '-stdio', '-l', 'en', '-norm']
        self.meteor_p = subprocess.Popen(self.meteor_cmd,
                cwd=os.path.dirname(os.path.abspath(__file__)),
                stdin=subprocess.PIPE,
                stdout=subprocess.PIPE,
                stderr=subprocess.PIPE,
                env=self.env, universal_newlines=True, bufsize=1)
        # Used to guarantee thread safety
        self.lock = threading.Lock()

    def compute_score(self, gts, res):
        assert(gts.keys() == res.keys())
        imgIds = sorted(list(gts.keys()))
        scores = []

        eval_line = 'EVAL'
        self.lock.acquire()
        for i in imgIds:
            assert(len(res[i]) == 1)
            # There's a situation that the prediction is all punctuations
            # (see definition of PUNCTUATIONS in pycocoevalcap/tokenizer/ptbtokenizer.py)
            # then the prediction will become [''] after tokenization
            # which means res[i][0] == '' and self._stat will failed with this input
            if len(res[i][0]) == 0:
                res[i][0] = 'a'
            stat = self._stat(res[i][0], gts[i])
            eval_line += ' ||| {}'.format(stat)

        # Send to METEOR
        self.meteor_p.stdin.write(eval_line + '\n')
        
        # Collect segment scores
        for i in range(len(imgIds)):
            score = float(self.meteor_p.stdout.readline().strip())
            scores.append(score)

        # Final score
        final_score = float(self.meteor_p.stdout.readline().strip())
        self.lock.release()

        return final_score, scores

    def method(self):
        return "METEOR"

    def _stat(self, hypothesis_str, reference_list):
        # SCORE ||| reference 1 words ||| reference n words ||| hypothesis words
        hypothesis_str = hypothesis_str.replace('|||', '').replace('  ', ' ')
        if sys.version_info[0] == 2:  # python2
            score_line = ' ||| '.join(('SCORE', ' ||| '.join(reference_list), hypothesis_str)).encode('utf-8').strip()
            self.meteor_p.stdin.write(str(score_line+b'\n'))
        else:  # assume python3+
            score_line = ' ||| '.join(('SCORE', ' ||| '.join(reference_list), hypothesis_str)).strip()
            self.meteor_p.stdin.write(score_line+'\n')
        return self.meteor_p.stdout.readline().strip()
 
    def __del__(self):
        self.lock.acquire()
        self.meteor_p.stdin.close()
        self.meteor_p.kill()
        self.meteor_p.wait()
        self.lock.release()

HYPJUDY avatar Oct 08 '19 07:10 HYPJUDY

Your code assumes that there will only ever be a version 2 and 3 for python. Don't assume that if the version is not 3 then it is 2. Instead check if it is 2 and if not then assume that the code for version 3 will work in the future as well. So switch your if/else around to 'if sys.version_info[0] == 2: ... else: ...

On Tue, 8 Oct 2019, 09:42 Yupan Huang, [email protected] wrote:

Thanks for your contribution. Based on @mtanti https://github.com/mtanti 's implementation, I modified two places to support meteor evalution for both py2 and py3.

  1. It seems that the code of

    score_line = ' ||| '.join(('SCORE', ' ||| '.join(reference_list), hypothesis_str)) self.meteor_p.stdin.write(score_line+'\n')

cannot support py2 and I changed it to

    if sys.version_info[0] == 3:  # python3
        score_line = ' ||| '.join(('SCORE', ' ||| '.join(reference_list), hypothesis_str)).strip()
        self.meteor_p.stdin.write(score_line+'\n')
    else:  # python2
        score_line = ' ||| '.join(('SCORE', ' ||| '.join(reference_list), hypothesis_str)).encode('utf-8').strip()
        self.meteor_p.stdin.write(str(score_line+b'\n'))
  1. Add a judgement in compute_score

      # There's a situation that the prediction is all puctuations
      # see definition of PUNCTUATIONS in pycocoevalcap/tokenizer/ptbtokenizer.py
      # then the prediction will become [''] after tokenization
      # which means res[i][0] == '' and self._stat will failed with this input
      if len(res[i][0]) == 0:
          res[i][0] = 'a'
    

The complete code of meteor.py is as following

#!/usr/bin/env python

Python wrapper for METEOR implementation, by Xinlei Chen# Acknowledge Michael Denkowski for the generous discussion and help from future import absolute_importfrom future import divisionfrom future import print_function

import osimport sysimport subprocessimport threading

Assumes meteor-1.5.jar is in the same directory as meteor.py. Change as needed.METEOR_JAR = 'meteor-1.5.jar'# print METEOR_JAR

class Meteor:

def __init__(self):
    self.env = os.environ
    self.env['LC_ALL'] = 'en_US.UTF_8'
    self.meteor_cmd = ['java', '-jar', '-Xmx2G', METEOR_JAR,
            '-', '-', '-stdio', '-l', 'en', '-norm']
    self.meteor_p = subprocess.Popen(self.meteor_cmd,
            cwd=os.path.dirname(os.path.abspath(__file__)),
            stdin=subprocess.PIPE,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            env=self.env, universal_newlines=True, bufsize=1)
    # Used to guarantee thread safety
    self.lock = threading.Lock()

def compute_score(self, gts, res):
    assert(gts.keys() == res.keys())
    imgIds = sorted(list(gts.keys()))
    scores = []

    eval_line = 'EVAL'
    self.lock.acquire()
    for i in imgIds:
        assert(len(res[i]) == 1)
        # There's a situation that the prediction is all puctuations
        # see definition of PUNCTUATIONS in pycocoevalcap/tokenizer/ptbtokenizer.py
        # then the prediction will become [''] after tokenization
        # which means res[i][0] == '' and self._stat will failed with this input
        if len(res[i][0]) == 0:
            res[i][0] = 'a'
        stat = self._stat(res[i][0], gts[i])
        eval_line += ' ||| {}'.format(stat)

    # Send to METEOR
    self.meteor_p.stdin.write(eval_line + '\n')

    # Collect segment scores
    for i in range(len(imgIds)):
        score = float(self.meteor_p.stdout.readline().strip())
        scores.append(score)

    # Final score
    final_score = float(self.meteor_p.stdout.readline().strip())
    self.lock.release()

    return final_score, scores

def method(self):
    return "METEOR"

def _stat(self, hypothesis_str, reference_list):
    # SCORE ||| reference 1 words ||| reference n words ||| hypothesis words
    hypothesis_str = hypothesis_str.replace('|||', '').replace('  ', ' ')
    if sys.version_info[0] == 3:  # python3
        score_line = ' ||| '.join(('SCORE', ' ||| '.join(reference_list), hypothesis_str)).strip()
        self.meteor_p.stdin.write(score_line+'\n')
    else:  # python2
        score_line = ' ||| '.join(('SCORE', ' ||| '.join(reference_list), hypothesis_str)).encode('utf-8').strip()
        self.meteor_p.stdin.write(str(score_line+b'\n'))
    return self.meteor_p.stdout.readline().strip()

def __del__(self):
    self.lock.acquire()
    self.meteor_p.stdin.close()
    self.meteor_p.kill()
    self.meteor_p.wait()
    self.lock.release()

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tylin/coco-caption/issues/27?email_source=notifications&email_token=ABLFWDZA7EXTKJ5V6TN75SDQNQ2YDA5CNFSM4EMTXEC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEATHB5Y#issuecomment-539390199, or mute the thread https://github.com/notifications/unsubscribe-auth/ABLFWD4E2MNIUXJV3RLSCVDQNQ2YDANCNFSM4EMTXECQ .

mtanti avatar Oct 08 '19 07:10 mtanti

Python 2 will be end-of-life next year. Why do you bother supporting it still?

ozancaglayan avatar Oct 08 '19 09:10 ozancaglayan

Thanks @mtanti for pointing it out! I've modified the code. @ozancaglayan Since I use the code of some repositories which only support python2 originally, I am transferring to python3 and switch between them to test the performance.

HYPJUDY avatar Oct 08 '19 10:10 HYPJUDY

Thanks for your contribution. Based on @mtanti 's implementation, I modified two places to support meteor evalution for both py2 and py3.

  1. It seems that the code of
        score_line = ' ||| '.join(('SCORE', ' ||| '.join(reference_list), hypothesis_str))
        self.meteor_p.stdin.write(score_line+'\n')

cannot support py2 and I changed it to

        if sys.version_info[0] == 2:  # python2
            score_line = ' ||| '.join(('SCORE', ' ||| '.join(reference_list), hypothesis_str)).encode('utf-8').strip()
            self.meteor_p.stdin.write(str(score_line+b'\n'))
        else:  # assume python3+
            score_line = ' ||| '.join(('SCORE', ' ||| '.join(reference_list), hypothesis_str)).strip()
            self.meteor_p.stdin.write(score_line+'\n')
  1. Add a judgement in compute_score
            # There's a situation that the prediction is all punctuations
            # (see definition of PUNCTUATIONS in pycocoevalcap/tokenizer/ptbtokenizer.py)
            # then the prediction will become [''] after tokenization
            # which means res[i][0] == '' and self._stat will failed with this input
            if len(res[i][0]) == 0:
                res[i][0] = 'a'

The complete code of meteor.py is as following

#!/usr/bin/env python

# Python wrapper for METEOR implementation, by Xinlei Chen
# Acknowledge Michael Denkowski for the generous discussion and help 
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os
import sys
import subprocess
import threading

# Assumes meteor-1.5.jar is in the same directory as meteor.py.  Change as needed.
METEOR_JAR = 'meteor-1.5.jar'
# print METEOR_JAR

class Meteor:

    def __init__(self):
        self.env = os.environ
        self.env['LC_ALL'] = 'en_US.UTF_8'
        self.meteor_cmd = ['java', '-jar', '-Xmx2G', METEOR_JAR,
                '-', '-', '-stdio', '-l', 'en', '-norm']
        self.meteor_p = subprocess.Popen(self.meteor_cmd,
                cwd=os.path.dirname(os.path.abspath(__file__)),
                stdin=subprocess.PIPE,
                stdout=subprocess.PIPE,
                stderr=subprocess.PIPE,
                env=self.env, universal_newlines=True, bufsize=1)
        # Used to guarantee thread safety
        self.lock = threading.Lock()

    def compute_score(self, gts, res):
        assert(gts.keys() == res.keys())
        imgIds = sorted(list(gts.keys()))
        scores = []

        eval_line = 'EVAL'
        self.lock.acquire()
        for i in imgIds:
            assert(len(res[i]) == 1)
            # There's a situation that the prediction is all punctuations
            # (see definition of PUNCTUATIONS in pycocoevalcap/tokenizer/ptbtokenizer.py)
            # then the prediction will become [''] after tokenization
            # which means res[i][0] == '' and self._stat will failed with this input
            if len(res[i][0]) == 0:
                res[i][0] = 'a'
            stat = self._stat(res[i][0], gts[i])
            eval_line += ' ||| {}'.format(stat)

        # Send to METEOR
        self.meteor_p.stdin.write(eval_line + '\n')
        
        # Collect segment scores
        for i in range(len(imgIds)):
            score = float(self.meteor_p.stdout.readline().strip())
            scores.append(score)

        # Final score
        final_score = float(self.meteor_p.stdout.readline().strip())
        self.lock.release()

        return final_score, scores

    def method(self):
        return "METEOR"

    def _stat(self, hypothesis_str, reference_list):
        # SCORE ||| reference 1 words ||| reference n words ||| hypothesis words
        hypothesis_str = hypothesis_str.replace('|||', '').replace('  ', ' ')
        if sys.version_info[0] == 2:  # python2
            score_line = ' ||| '.join(('SCORE', ' ||| '.join(reference_list), hypothesis_str)).encode('utf-8').strip()
            self.meteor_p.stdin.write(str(score_line+b'\n'))
        else:  # assume python3+
            score_line = ' ||| '.join(('SCORE', ' ||| '.join(reference_list), hypothesis_str)).strip()
            self.meteor_p.stdin.write(score_line+'\n')
        return self.meteor_p.stdout.readline().strip()
 
    def __del__(self):
        self.lock.acquire()
        self.meteor_p.stdin.close()
        self.meteor_p.kill()
        self.meteor_p.wait()
        self.lock.release()

Thanks, your solution help me solve the proc.stdout.readline() hanged problem!

MarcusNerva avatar Apr 01 '20 02:04 MarcusNerva

I just stumbled across this and our https://github.com/Maluuba/nlg-eval supports Python 3

kracwarlock avatar Dec 23 '20 15:12 kracwarlock

Hi all, I'd like to know whether you have plans to port the codebase to Python-3. Since most of the people have switched to Python-3, it'd be nice to have Python-3 support so that other projects (for e.g. ImageCaptioning PyTorch ) dependent on coco-caption can also be implemented in Python-3.

Thanks!

It has been 3 years since I first commented and a lot has changed in the meantime. So, I'm now working with a much more elegant toolkit, facebookresearch/vizseq, which supports visualization with extension to multiple modalities (video, audio) and more recent embedding-based metrics.

kmario23 avatar Dec 25 '20 05:12 kmario23