본문 바로가기

Computer83

jaro-winkler similarity(jaro-winkler distance) 프로젝트 중 알게된 edit distance 비교 방법이 있어서 정리해본다.(사실 매번 Damerau–Levenshtein distance만 사용했었다...) jaro similarity(jaro distance) jaro distance는 두 단어간의 transpositions에 집중한 알고리즘이다.(insertion, deletion, substitution은 고려하지 않음) transposition은 간단히 위치 교환이라고 생각하면 된다. 아래와 같이 두 단어가 있을 때, transpositions은 총 2회 발생한다.(a=> b, b => a) word1 : a ---- bword2 : b ---- a jaro distance는 두 단어가 비슷할수록 1에 가까운 값을 가지고, 다를수록 0에 가까운.. 2018. 5. 13.

noisy channel model and spell correction 잠깐 보고 정리해봅니다...(기억력의 한계를 극복해보고자...) ref : https://web.stanford.edu/~jurafsky/slp3/5.pdf일단 noisy channel model은 "original word가 noisy channel에 의해 noisy word(distorted 됐다고 표현)가 되고, 이를 decoder를 통해 original word와 가장 비슷한 것을 추측"하는 모델인데... 스펠러와 연결지어 생각해보면... 1) misspelled word : noisy word(noisy channel을 통해 distorted된 word)2) noise는 substitutions or other changes to the letters(original word에서 distorte.. 2018. 5. 8.

git untrackedfiles off git config --global status.showUntrackedFiles no 2018. 2. 22.

이차원 배열 포인터 이차원 배열 포인터를 지정해서 사용하는 방법 이렇게 하면, main에 선언된 *input의 포인터 사이즈가 MAX_EOJ_SIZE로 할당된다.main에서는 input[x][y] 형식으로 사용할 수 있다. #include #include #include #define MAX_EOJ_COUNT 16#define MAX_EOJ_SIZE 64 typedef struct data { char input[MAX_EOJ_COUNT][MAX_EOJ_SIZE];} data_t; int main() { data_t *data = malloc(sizeof(data_t)); char (*input)[MAX_EOJ_SIZE] = data->input; int i; strcpy(input[0], "abc"); strcpy(inp.. 2017. 1. 25.

이전 1 2 3 4 5 6 7 8 ··· 21 다음

티스토리툴바