Skip to content

chaojiang06/arXivEdits

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 

Repository files navigation

arXivEdits

The data for our EMNLP 2022 paper arXivEdits: Understanding the Human Revision Process in Scientific Writing is provided at this repo.

The name of each field should be self-explainable. If you have any questions, please reach me at chaojiang06@gmail.com.

The code for extracting plain text from latex source code was written by awesome Sam Stevens when he was an undergraduate student. The raw code can be found here.

Update on 2023/09/12

We add a pipeline code. Input paragraph pairs; it will do sentence alignment, edit extraction, and intention classification for you. Your one-stop solution for revision analysis. Check it out!

Update on 2023/07/14

We upload all fine-tuned T5 intention classification models to the huggingface hub. The code is in the code folder

Update on 2023/02/22

We upload all fine-tuned BERT checkpoints to the huggingface hub and provide a sample code to use them.

Update on 2023/01/23

We add license information every version of all papers. For example:

    train['1608.00087']['license'] = {'1': 'http://arxiv.org/licenses/nonexclusive-distrib/1.0/', '2': 'http://arxiv.org/licenses/nonexclusive-distrib/1.0/'}

In total, we find the following licenses:

    http://arxiv.org/licenses/assumed-1991-2003/
    http://arxiv.org/licenses/nonexclusive-distrib/1.0/
    http://creativecommons.org/licenses/by-nc-sa/4.0/
    http://creativecommons.org/licenses/by-sa/4.0/
    http://creativecommons.org/licenses/by/3.0/
    http://creativecommons.org/licenses/by/4.0/
    http://creativecommons.org/licenses/publicdomain/
    http://creativecommons.org/publicdomain/zero/1.0/

We also add the source arxiv-id for each sentence pair in the edits sub-dataset.

Thanks for the suggestions from Qian Ruan from the UKP Lab!

Reference

If you find our paper or dataset useful, please considering cite the following paper.

@article{jiang-etal-2022-arXivEdits,
  title={arXivEdits: Understanding the Human Revision Process in Scientific Writing},
  author={Jiang, Chao and Xu, Wei and Stevens, Samuel},
  journal={In Proceedings of EMNLP 2022},
  year={2022}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published