我该如何提取使用vim文本具体段落?

我想从一个包含此格式的文本一个巨大的文件,多次提取试验

CL blahblahblah SP blahblahblah blahblahblah blahblahblah DE blahblahblahblahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah AB blahblahblah blahblahblah blahblahblah blahblahblahblahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah C1 blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah lahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah RP blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah EM blahblahblah blahblahblah blahblahblah blahblahblah NR blahblahblah blahblahblah blahblahblah blahblahblah TC blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah Z9 blahblahblah blahblahblah blahblahblah blahblahblah PU blahblahblah blahblahblah blahblahblah blahblahblah PI blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah

我只用C 1,AB的TI开始条目感兴趣,但有时这些跨越多行,并且在追踪他们的XX标记线不总是相同的。 有没有一种简单的方法,只保留这些条目? 所以,我的剩余的文本应该是这样的:

TI blahblahblah AB blahblahblah b lah blahblah blah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah C1 blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah TI blah blah blah blah blah blah AB blahblahblah blahblahblah blahblahblah blahblahblahblahblahblah blahblahblah blahblahblah blahblahblah blahblahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah C1 blahblahblah blahblahblah blahblahblah blahblahblahblahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah

等等..

非常感谢!

--------------解决方案-------------

我会做:

:$put='X' | 1,$-1g/^\(\s\|C1\|AB\|TI\)\@!/ ,/^\S/-d
:$d

这将做到以下几点:

  • 插入包含​​“X”在最后一行
  • 对于除了最后一个(每行1,$-1 ),如果它与非空格开始,不与C1,AB或TI(启动g/pattern/ ),删除( d ),直到下一行无法启动空间,/pattern/不包括( -这是短期的-1
  • 删除行“X”末

为了尝试,如果你正在使用gvim的:

  • 这段代码复制到剪贴板
  • 在gvim的运行:@+ (起着Ex命令从+寄存器链接到剪贴板)。

我得到了什么:

AB blahblahblah blahblahblah blahblahblah
blahblahblahblahblahblah blahblahblah blahblahblah
blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah
blahblahblah blahblahblah blahblahblah
C1 blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah
blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah
blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah
lahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah

这应该工作:

:let @a="" | g/^\v<(C1|AB|TI)>/norm! "Ay/^\S^M

编辑 Windows专用:你需要添加一个“回归”到该行,输入^M为C-qEnter(或Cv ,如果你不使用Windows或你的vimrc不设置behave mswin

获取行到寄存器"a要与那些行替换缓冲:

:%d | put a

或者,把它变成一个新的缓冲区:

:new | put a

awk解决方案:

awk '
BEGIN{
tags["C1"]
tags["AB"]
tags["TI"]
}
{
match($0, /^\w+/)
if(RSTART)
t=substr($0, RSTART, RLENGTH)
}
t in tags' input.txt

翻译成Vim命令:

:g/^/let t=matchstr(getline('.'), '^\w\+') | if !empty(t) | let tag=t | endif | if index(['C1', 'AB', 'TI'], tag)==-1 | d | endif

这似乎工作,但它留下一个空行的文件的末尾。

:%s/\v^(C1|AB|TI|\s)@!\_.{-}\n(C1|AB|TI|$)@=//

这正则表达式利用了一些棘手的功能,我会试着解释。

  • \v表示,模式是“非常神奇”,只是让我们跳过反斜杠在几个地方。
  • ^(C1|AB|TI|\s)@!匹配不与目标标记或空白开始的任何行。
  • \_.匹配包括换行符的任何字符。
  • {-}以前的原子作为几次尽可能(非贪婪)相匹配。
  • \n匹配行的末尾。
  • (C1|AB|TI|$)@=目标标记或行的端部(对于最终的情况)零宽度匹配。

与您的测试输入的结果是这样的:

AB blahblahblah blahblahblah blahblahblah
blahblahblahblahblahblah blahblahblah blahblahblah
blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah
blahblahblah blahblahblah blahblahblah
C1 blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah
blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah
blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah blahblahblah

另一个AWK onliner:

awk -F' |\t' '{if($1)f=$1~/CI|AB|C1/?1:0}f' yourFile

分类:VIM 时间:2015-03-15 人气:0
本文关键词: 文字,VIM,提取
分享到:

相关文章

Copyright (C) 55228885.com, All Rights Reserved.

55228885 版权所有 京ICP备15002868号

processed in 0.357 (s). 10 q(s)