Open-Assistant icon indicating copy to clipboard operation
Open-Assistant copied to clipboard

Conversion of Data agumentation Notebooks to python script functions

Open finitearth opened this issue 2 years ago • 12 comments

As discussed in the discord I'd like to convert the Notebooks inside of the data argumentation to a runable python script with argument parsing.

finitearth avatar Jan 06 '23 22:01 finitearth

Can you give a description so people know what you are doing?

huu4ontocord avatar Jan 06 '23 23:01 huu4ontocord

I will take the functionalities in

  • https://github.com/LAION-AI/Open-Assistant/tree/main/notebooks/data-argumentation
  • https://github.com/LAION-AI/Open-Assistant/tree/main/notebooks/code-bugger
  • https://colab.research.google.com/drive/1nZx5LRjO61fYprFyqtrwPDLOis6ctR4p#scrollTo=1EE8CriiaCXj and convert them into a single python file.

The goal is to be able to run them via command line, and enabling to pass tsv file pointing to the essay-txts or similar.

Those newly generated question - answer pairs can than be passed to the model for fine tuning.

finitearth avatar Jan 07 '23 00:01 finitearth

thank you!

huu4ontocord avatar Jan 07 '23 04:01 huu4ontocord

Unfortunately I have to work during the week I wanted to add to the code-bugger at least one more error type and some internal checks to guarantee that the error has been applied and to also return a dictionary with type of errors injected, the locations and which strings have been substituted to what. That should be a good enough starting point to generate conversations. If we have future plans to actually run code that the assistant is writing we should add some sort of flags to code that can't be running (e.g. bugged methods that overflow memory or do infinite loops).

furlat avatar Jan 08 '23 01:01 furlat

Is there some draft PR about this? As I think I could finish this one if needed.

nil-andreu avatar Jan 08 '23 10:01 nil-andreu

Is there some draft PR about this? As I think I could finish this one if needed.

Sure, I will do a draft PR rn. Wasn't able to test the Codebugger, because of some installation issues i couldnt resolve yet. Other classes should work fine however

finitearth avatar Jan 08 '23 14:01 finitearth

@finitearth were you on windows? I did from mac and Indeed the pip install was not working in windows, it now works on my gaming pc. I anyway will add a new notebook with a big copypasta of the methods to simplify your life.

furlat avatar Jan 08 '23 15:01 furlat

@finitearth were you on windows? I did from mac and Indeed the pip install was not working in windows, it now works on my gaming pc. I anyway will add a new notebook with a big copypasta of the methods to simplify your life.

Yes indeed I'm working on windows. It appears to be an issue with the pip installation.

Feel free to send me the notebook on discord (also finitearth)

finitearth avatar Jan 08 '23 16:01 finitearth

can you try if it works now? it should be working, I tried on w10 locally, I updated the setup.py 20 minutes ago on openasisstant github so by re-cloning it should get the correct one, also might be because it is calliing python3 pip and you want python pip. Opening a pull-request here with the new notebook anyway.

furlat avatar Jan 08 '23 16:01 furlat

see https://github.com/LAION-AI/Open-Assistant/pull/546

furlat avatar Jan 08 '23 16:01 furlat

checking on this @finitearth and @furlat

huu4ontocord avatar Jan 10 '23 05:01 huu4ontocord

From my side I think I fixed the bug for pip install on windows, also added in the 546 a notebook that requires no install. I am working this week but in the weekend I will update the bugging code.

furlat avatar Jan 10 '23 08:01 furlat

PR: https://github.com/LAION-AI/Open-Assistant/pull/570

finitearth avatar Jan 14 '23 13:01 finitearth

I forgot to ask, but could you keep my copyright notice in the section of the code that I wrote?

# coding=utf-8
# Copyright 2021-2023, Ontocord, LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

huu4ontocord avatar Jan 23 '23 05:01 huu4ontocord