LLM

    BLEU (Bilingual Evaluation Understudy) Score - LLM Evaluation

    BLEU (Bilingual Evaluation Understudy) Score - LLM Evaluation

    LLM이 생성한 (예측한) 결과를 평가하기 위해 PPL (Perplexity) 도 쓰이지만, 그보다 BLEU Score를 더 많이 사용한다.  PPL은 쉽게 말해 모델이 정답을 예측할 때 헷갈려하는 정도이다. 따라서 작으면 작을수록 좋다. 5개의 선택지 중 고민하는 것 보다, 2개의 선택지 중 고민하는게 모델이 그만큼 똑똑하다는 것이니까. 하지만, 번역 작업을 예로 들면, 우리는 문장의 맥락을 고려해서 번역이 잘 되었는지를 평가하고 싶어 한다. 해당 평가에 BLEU Score를 사용한다.  BLEU란?$BLEU = BP\cdot \prod_{n=1}^{N}p_{n}^{w_{n}}$로 계산되는 Evaluation Score이다. $p_{n}$ : n-gram precision$w_{n}$ : weight,..

    [논문리뷰] SUPER-NATURALINSTRUCTIONS: Generalization via Declarative Instructions on 1600+ NLP Tasks

    [논문리뷰] SUPER-NATURALINSTRUCTIONS: Generalization via Declarative Instructions on 1600+ NLP Tasks

    https://arxiv.org/abs/2204.07705 Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks How well can NLP models generalize to a variety of unseen tasks when provided with task instructions? To address this question, we first introduce Super-NaturalInstructions, a benchmark of 1,616 diverse NLP tasks and their expert-written instructions. Our arxiv.org Summary 1..

    [논문리뷰] LLM4TS : Two-Stage Fine-Tuning for Time-Series Forecasting with Pre-Trained LLMs

    [논문리뷰] LLM4TS : Two-Stage Fine-Tuning for Time-Series Forecasting with Pre-Trained LLMs

    [논문 링크] https://arxiv.org/abs/2308.08469 LLM4TS: Aligning Pre-Trained LLMs as Data-Efficient Time-Series Forecasters Multivariate time-series forecasting is vital in various domains, e.g., economic planning and weather prediction. Deep train-from-scratch models have exhibited effective performance yet require large amounts of data, which limits real-world applicability. arxiv.org 시계열 데이터를 LLM으로 ..