Pengfei Liu issues

Results 42 issues of


                                            Pengfei Liu

Update README.md

Add customized feature function

This PR aims to make feature functions customizable, either through build-in or build-out definitions. For example, ##### (1) Build-out ```python loader = get_loader_class(TaskType.text_classification).from_datalab( dataset=DatalabLoaderOption( "sst2", custom_features={ "long_text_50": { "dtype": "string",...

Create the glossary of the technical terms (or ubiquitous languages) in ExplainaBoard

(This is suggested by @odashi ) The current documentation mainly lacks things except how-to guides that makes other developers hard to understand how the software is developed.

New Featuers for KGExplainaBoard

- [x] Generalize evaluation metric: Hit@k (k is a dynamic parameter) * a simple way to do this we can support a variety of metric variants, such as hit@1, hit@2,...

new-functionality

Evaluation for Gaokao Benchmark

- [x] multiple-choice -> accuracy - [ ] conditional generation-based qa (hint) -> accuracy - [ ] grammar error correction -> recall? - [ ] conditional text generation -> human...

Unittest errors due to file downloading

The following errors happened several times, maybe we can move it (the file) to S3. ``` ConnectionError: Couldn't reach http://www.phontron.com/download/conala-corpus-v1.1.zip (ConnectionError(MaxRetryError("HTTPConnectionPool(host='www.phontron.com', port=80): Max retries exceeded with url: /download/conala-corpus-v1.1.zip (Caused by...

Introducing a unified & general interface for class serialization

It's good to consider introducing a unified & general interface for `from_dict` Some existing implementation & discussion * https://github.com/neulab/ExplainaBoard/blob/aaf91a57f4a5c143a6f6c2aec33a81a3e51690b8/explainaboard/utils/serialization.py#L4 * https://github.com/neulab/ExplainaBoard/pull/301 * https://github.com/neulab/ExplainaBoard/blob/aaf91a57f4a5c143a6f6c2aec33a81a3e51690b8/explainaboard/feature.py#L43

Mark for discussion

- [x] introduce the concept of subdataset? - [ ] compression strategy of `dataset._stat`? - [ ] different cache methods for `dataset._stat`?

More Comprehensive Test Suites

This is a draft doc to maintain the design of deeper tests for ExplainaBoard so that potential hidden bugs could be captured when we make major refactoring. Generally, from the...

Inconsistent number of test samples between test data and system predictions in KG task

the number of test samples in https://github.com/PhaelIshall/KGExplainaBoard/tree/main/FB15K-237 is: 20466 while the number of test samples in https://github.com/PhaelIshall/KGExplainaBoard/tree/main/user-defined-ranks is: 20441, such inconsistency results in some errors when we introduce training set...